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PREFACE 


At the meeting of the Mathematical Association of 
America held in the Summer of 1955, 1 had the privilege of 
delivering the Hedrick Lectures. I was highly gratified 
when, sometime later, Professor T. Rado, on behalf of the 
Committee on Carus Monographs, kindly invited me to 
expand my lectures into a monograph. 

At about the same time I was honored by an invitation 
from Haverford College to deliver a series of lectures under 
the Philips Visitors Program. This invitation gave me an 
opportunity to try out the projected monograph on a “live” 
audience, and this book is a slightly revised version of my 
lectures delivered at Haverford College during the Spring 
Term of 1958. 

My principal aim in the original Hedrick Lectures, as 
well as in this enlarged version, was to show that (a) 
extremely simple observations are often the starting point 
of rich and fruitful theories and (6) many seemingly un- 
related developments are in reality variations on the same 
simple theme. 

Except for the last chapter where I deal with a spec- 
tacular application of the ergodic theorem to continued 
fractions, the book is concerned with the notion of statisti- 
cal independence. 

This notion originated in probability theory and for a 
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long time was handled with vagueness which bred sus- 
picion as to its being a bona fide mathematical notion. 

We now know how to define statistical independence in 
most general and abstract terms. But the modern trend 
toward generality and abstraction tended not only to 
submerge the simplicity of the underlying idea but also to 
obscure the possibility of applying probabilistic ideas 
outside the field of probability theory. 

In the pages that follow, I have tried to rescue statistical 
independence from the fate of abstract oblivion by showing 
how in its simplest form it arises in various contexts 
cutting across different mathematical disciplines. 

As to the preparation of the reader, I assume his famili- 
arity with Lebesgue’s theory of measure and integration, 
elementary theory of Fourier integrals, and rudiments of 
number theory. Because I do not want to assume much 
more and in order not to encumber the narrative by too 
many technical details I have left out proofs of some state- 
ments. 

I apologize for these omissions and hope that the reader 
will become sufficiently interested in the subject to fill these 
gaps by himself. I have appended a small bibliography 
which makes no pretence at completeness. 

Throughout the book I have also put in a number of 
problems. These problems are mostly quite difficult, and 
the reader should not feel discouraged if he cannot solve 
them without considerable effort. 

I wish to thank Professor C. 0. Oakley and R. J. Wisner 
of Haverford College for their splendid cooperation and for 
turning the chore of traveling from Ithaca to Haverford 
into a real pleasure. 

I was fortunate in having as members of my audience 
Professor H. Rademacher of the University of Pennsyl- 
vania and Professor John Oxtoby of Bryn Mawr College. 
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Their criticism, suggestions, and constant encouragement 
have been truly invaluable, and my debt to them is great. 

My Cornell colleagues, Professors H. Widom and 
M. Schreiber, have read the manuscript and are respon- 
sible for a good many changes and improvements. It is 
a pleasure to thank them for their help. 

My thanks go also to the Haverford and Bryn Mawr 
undergraduates, who were the “guinea pigs,” and especially 
to J. Reill who compiled the bibliography and proofread 
the manuscript. 

Last but not least, I wish to thank Mrs. Axelsson of 
Haverford College and Miss Martin of the Cornell Mathe- 
matics Department for the often impossible task of typing 
the manuscript from my nearly illegible notes. 

Mark Kac 

Ithaca, New York 

September, 1959 
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CHAPTER l 


FROM VIETA TO THE NOTION OF 
STATISTICAL INDEPENDENCE 


1. A formula of Vieta. We start from simple trigo- 
nometry. Write 

x x 

sin x — 2 sin - cos - 

2 2 

X X X 

= 2 sin - cos - cos - 


( 1 . 1 ) 


Z X X X 

= 2 s sin - cos - cos - cos - 
8 8 4 2 


= 2 W sin — TT cos — • 

2 n 2* 

From elementary calculus we know that, for x 0, 


sm - 

I 

lim : 

n— »« X 


2 n 


1 . x 

- lim 2 n sin — > 
X n— >« 2 n 


2 n 


and hence 

( 1 . 2 ) 


lim 2” sin — = x. 

2 n 
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Combining (1.2) with (1.1), we get 


00 


(1.3) 


sin x x 

= II cos — 

X 1 i 2 fc 


A special case of (1.3) is of particular interest. Setting 
x — tt/2, we obtain 


IT 


(w) ; = ncos - +1 




\ 


M=1 

V2 V2 + \/2 V 2 + \/2 + a /2 
2 2 2 ’ 

a classical formula due to Vieta. 

2. Another look at Vieta’s formula. So far every- 
thing has been straightforward and familiar. 

Now let us take a look at (1.3) from a different point of 
view. 

It is known that every real number t, 0 < t < 1 , can be 
written uniquely in the form 


(2.1) 


ex e 2 

( =2 + iS + 


where each e is either 0 or 1. 

This is the familiar binary expansion of t, and to ensure 
uniqueness we agree to write terminating expansions in the 
form in which all digits from a certain point on are 0. 
Thus, for example, we write 

3 110 0 

4 2 2 2 ~ 2 3 2 4 


• • • 
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3 


rather than 


3 10 11 

^ ■ ■* ■ ■ ■'! ■■ i ■ ■ I ii • • « 

4 2 2 2 2 3 2 4 


♦ 


The digits e t - are, of course, functions of t, and it is more 
appropriate to write (2.1) in the form 





e 2 (t) 





With the convention about terminating expansions, the 
graphs of ei(t), e 2 (t), e 3 (t), • • • are as follows: 


i- 

0 





It is more convenient to introduce the functions r t (Z ) de- 
fined by the equations 

(2.3) rjc(t) = 1 — 2tk(t), k = 1, 2, 3, • • •, 
whose graphs look as follows: 


L 


0 


j i 

1 0 



These functions, first introduced and studied by H. Rade- 
macher, are known as Rademacher functions. In terms of 
the functions rjc(t), we can rewrite (2.2) in the form 


1 - 2 ( = £ 

fecal 


n(t) 

2 k 


(2.4) 
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Now notice that 


and 



e ix(l-2t) fa _ 


sm x 


x 


f 1 (. n(t)\ 

J exp Jat = cos 


x 

2 k 


Formula (1.3) now assumes the form 


sm x 


x 



ix{l~2t) dt 


r' (■ 

= I expljx2^— j- 

•'O ' fc=l 2 


dt 


“ * « r 1 /. n(t)\ 

II ~ k = II I exp [uc — 'jdt, 


k=l 


and, in particular, we have 


c 2 - 5 ) f n ex p ( ix dt = n f ex p ( ix 

Jo fc=i \ * / k=i Jo \ 

An integral of a product is a product of integrals ! 



3. An accident or a beginning of something 
deeper? Can we dismiss (2.5) as an accident? Certainly 
not until we have investigated the matter more closely. 
Let us take a look at the function 


n 

12 w^t). 


^=1 


It is a step function which is constant over the intervals 



and the values which it assumes are of the form 


\ 


d zC\ dr C2 dr • • • dzC n . 
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Every sequence (of length n) of +l’s and — l’s corresponds 
to one and only one interval ($/ 2 n , (s + l)/2 n ). Thus 

n i » 

I exp [i 52 Ckn(t)] dt = — 2 exp (i ±c fc ), 

Jo 1 2 n i 

where the outside summation is over all possible sequences 
(of length n) of +Ts and —l’s. 

Now 


1 w 

— 2 exp (i 52 ±c k ) 
* i 

and consequently 


=n( 

fc=i \ 


e iCk + e ick 


) n 

-n 

/c=l 


COS Cjo 


fl n n 

(3.1) exp [» D cn(.t)] dt = II COS Ck 

Jo 1 k= 1 


Setting 


n r l 

= n f ** 

fc=l Jo 




Cfc = 


X 

2 k 


we obtain 


and, since 


/„ exp (“ ? di = s 


” rfc(0 

lim 52 — r- = 1 — 2< 

n— j ^ 


a: 

cos — » 
2 k 


uniformly in (0, 1), we have 

l 


sm x 


- J - 


tx(l— 20 


dt = lim I exp ( ix 52 

n— +oo q \ j 






” a; * a? 

= lim XI «os 3 = n cos -T 


n— ►<*> 
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We have thus obtained a different proof of formula 
(1.3). Is it a better proof than the one given in § 1? 

It is more complicated, but it is also more instructive 
because it somehow connects Vieta’s formula with binary 
digits. 

What is the property of binary digits that makes the 
proof tick? 

4. (£) n = \ • • • \ (n times). Consider the set of t’a 
for which 

ri(t) = +1, r 2 (t) = -1, r 3 (t) = -1. 

One look at the graphs of r i, r 2 , and r 3 will tell us this 
set (except possibly for end points) is simply the interval 

(ib £■) • 

The length (or measure) of this interval is clearly £, and 

1111 
8 ~~ 2 ’ 2 ’ 2 

This trivial observation can be written in the form 
Ari (t) = +1, r 2 (t) = -1, r 3 (t) = -1} 

= mMO = + 1 } m {^(0 = - 1 } m {^(0 = - 1 }, 

where n stands for measure (length) of the set defined 
inside the braces. 

The reader will have no difficulty in generalizing this 
to an arbitrary number of r’ s. He will then get the follow- 
ing result: If Si, • • • , 8 n is a sequence of +l’s and — Ts then 

/*{fl(0 = ^1> ' t Tn(f) = 

= m { r x (0 = ai}/x{r 2 (£) = S 2 } ••• m{^(0 = «»}• 

This may seem to be merely a complicated way of writing 

(£)" = £ X J X • • • X £(n times), 
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but in reality it is much more. It expresses a deep property 
of the functions rk(t) (and hence binary digits) and is a 
starting point of a rich and fruitful development. It is 
this property which is at the heart of the proof of § 3. For 
(3.1) can now be proved as follows: 


/•I n 

I exp [i 23 CkTkit)] dt 

Jo 1 


n 


= 23 ex P (* 13 c k h)fi{ri(t) = Si, • • •, r n (t) 

di, • • • tdn 1 

= 23 H e lCkdk H n{rk(t) = Sfc} 

$i» * • ■ *Sn 1 1 

n 

= 13 n e ick %{rk(t) = Sfc} 

Bit • • • tBn ks=il 
n 

= II 13 e iCkh n{r k (t) = S fc } 

/c=l dk 


Sn} 


n ^*1 

= n f 

fc=I ‘'O 


ic * r * (< ) dt. 


5. Heads or tails? The elementary theory of coin 
tossing starts with two assumptions: 

a. The coin is “fair.” 

b. The successive tosses are independent. 

The first assumption means that in each individual toss 
the alternatives H (heads) and T (tails) are equiprobable, 
i.e., each is assigned “probability” The second is used 
to justify the “rule of multiplication of probabilities.” 
This rule (stated in vague terms) is as follows: If events 
A\, • • • ,A n are independent, then the probability of their 
joint occurrence is the product of the probabilities of their 
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individual occurrences. In other words: 

(5.1) Prob. {Ai and A 2 and A 3 • • • and A n } 

= Prob. {Ai},Prob. {A 2 }, ‘-^Prob. {A n }. 

Applied to independent tosses of a fair coin, the rule tells 
us that the probability associated with any given pattern 
(of length n) of H’s and T’s (e.g., HHTT • • • T) is 

11 11 
- X - X • • • X - = — • 


This is quite reminiscent of § 4, and we can use the func- 
tions Tk(t) as a model for coin tossing. To accomplish this, 
we make the following dictionary of terms: 

Symbol H +1 

Symbol T —1 

fcth toss (k = 1, 2, • • •) rjc(t) (k = 1, 2, • • •) 

Event Set of t’s 

Probability of an event Measure of the correspond- 
ing set of t’s. 

To see how to apply this dictionary, let us consider the 
following problem: Find the probability that in n inde- 
pendent tosses of a fair coin, exactly l will be heads. Using 
the dictionary we translate the problem to read : 

Find the measure of the set of t’s such that exactly l of 
the n numbers ri(t), r 2 (t), • • •, r n (t) are equal to +1. We 
can solve this problem (without the usual recourse to com- 
binations) by a device which we shall meet (under different 
guises) many times in the sequel. 

First of all, notice that the condition that exactly l 
among ri{t), • • •, r n (t) are equal to 1 is equivalent to the 
condition 


(5.2) ri(t) + r 2 (t ) + • • • + T n (t) = 21 — n. 
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Next notice that, for m an integer, one has 


(5.3) 



27 r 


dx = 


2ir 


1 , m = 0 

. 0 , m 0, 


and consequently 


(5.4) <«f) 


-±f 

27T Jq 


2tt 


g ix[r 1 (<) + (21— »)1 ^ 


is equal to 1 if (5.2) is satisfied and is equal to 0 otherwise. 
Thus, 


n{n(t) 


+ • • • + r n (t ) — 21 — n} = r <£(£) 

Jo 

/»1 1 2tt 

= | — I e izlr 1 U) + ---+r n (t)-(2l-n)] 

J o 2ir «/n 


= - f 

2ir Jo 




, — i(2Z — n)x 


( f +»•»(*)! dt^dx. 


(The last step involves interchange of the order of integra- 
tion. This is usually justified by appealing to a general 
theorem of Fubini. In our case the justification is trivial 
since ri(t ) -) b r n (t) is a step function.) 

Now recall (3.1) ; use it with Ci = c 2 = • • • = c n = x, and 
obtain 


(5.5) y.{r x it) -j b r n (t) =2 1 - n) 


l r 2x 

___ I — n)x 

2tt J o 

Finally we leave it as an exercise to show that 


cos w x dx. 

J 


(5.6) n{ri(t) -1 b r n (t) = 21 — n] 






6. Independence and “Independence.” The notion 
of independence, though of central importance in proba- 
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bility theory, is not a purely mathematical notion. The 
rule of multiplication of probabilities of independent 
events is an attempt to formalize this notion and to build a 
calculus around it. One is naturally inclined to consider 
events which seem unrelated as being independent of each 
other. Thus a physicist considering events taking place 
in two samples of a gas far removed from each other will 
consider them as independent (how could they be other- 
wise if one sample is, say, in Bismarck, N. D., and the 
other in Washington, D. C.?) and will cheerfully invoke 
the rule of multiplication of probabilities. 

Unfortunately, in so doing he may (innocently and un- 
wittingly) create the impression that what is involved 
here is a strict logical implication. 

What is really involved is a definition of independence 
and a belief (borne out by experience and experiment, to 
be sure) that the definition is applicable to a particular 
situation. 

There is, thus, independence in a vague and intuitive 
sense, and there is “independence” in the narrow but well- 
defined sense that the rule of multiplication of probabilities 
is applicable. 

It was the vague and intuitive notions that provided 
for a long time the main motivation and driving force 
behind probability theory. 

And while an impressive formalism was being created, 
mathematicians (with very few exceptions) remained 
aloof because it was not clear to them what the objects 
were to which the formalism was applicable.* 

Then in 1909, E. Borel made the observation that the 

* Imagine a book on differential equations written solely in terms 
of masses, forces, accelerations, and the like falling into the hands 
of someone who has never heard of mechanics. The rich purely 
mathematical content of such a book could well be lost to this hypo- 
thetical reader. 
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binary digits «&(£) [or equivalently the Rademacher func- 
tions rk(t)] were “independent” [see (4.1)]. 

At long last, there were well-defined objects to which 
probability theory for independent events could be applied 
without fear of getting involved with coins, events, tosses, 
and experiments. 

The appearance of B Orel’s classical memoir “Sur les 
probability d6nombrables et leurs applications arithm6- 
tiques” marks the beginning of modern probability theory, 
and in the next chapter we shall discuss some of the lines 
along which the theory developed. 


PROBLEMS 


1. Write the ternary expansion of t, 0 < t < 1, in the form 

, vi(0 , vi(t) , m(t) , 

* n 1 n O I ftQ I 


3 2 


3 3 


(each w can assume values 0, 1, and 2), and prove that the t?’s are 
independent. 

2. Prove that 

2x 

„ 1 + 2 cos -v 


sm x 


x -n — r 

X k=l o 

and generalize it. 

3. Prove that if fci < kz < • • • < k a then 


3* 


f r*!(0r*,(0 • • • n s (t) dt = 0. 


4. Let 2n (an even positive integer) be written in binary notation 

2 n — 2 ni -J- 2” 2 -|- • • • -f- 2”*, 1 < ni ^2 <•••< Tik, 

and define the functions w n (t ) (the Walsh-Kaczmarz functions) as 
follows: 

Wo(t) = 1 


«i(0 = r ni (t) - • • r nk (t), J| > 1. 
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Prove that 

,1 


(d) f Wm(i)Wji(t') dt — dm ,n* 
•'O 

(6) If /(f) is integrable and 


i •, « \ nr 


V* — 




aS ' ' 


J 


V 


N' 






,r/ 


0 tf 


x 


f(t)w n (t) dt = 0, n = 0, 1, 2, 


then /(f) = 0 almost everywhere. 


.1 „1 2 » 


( c ) I I I X) Wk(t)wk(s) \dtds = 
*'o Jo *=o 


5. Using the formula 


1 f " 1 — cos zx 


„ 1 f 

7 T J_ 


dx 


CO 


X 4 


prove first that 


^•1 n i /•« 

I I Y>r k (t)\dt = - I 

Jq 1 IT J — 


l r°° l — cos" x 


dx > 


00 




1 ~1 ly/n 1 

-J - 

7T J-i/Vn 


COS" X 


dx 


X‘ 


and finally that 


f I 2 r*(f) 1 dt > 4Vn 
Jo 1 


with 


If 1 !- e -v2/2 

A = - , dy. 

ir J_i y z 


Note: Schwarz’s inequality combined with the result of Problem 3 
for s = 2 gives 


f I £ nt(0 1 dt < Vn. 
Jo 1 
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BOREL AND AFTER 


1. “Laws of large numbers.” You have all heard 
that if you play a fair game of chance, then, in the long run, 
it is unlikely that you will get rich. “The law of averages 
will take care of it” is what one hears uttered wisely in this 
and similar connections. What is this “law of averages”? 
Is it some sort of a physical law, or is it a purely mathe- 
matical statement? It is mostly the latter, although the 
agreement with experimental evidence is remarkably 
good. Let us forget about experimental evidence and 
concentrate on the mathematical issues. Suppose I toss 
a “fair” coin, winning $1 each time H comes up and losing 
$1 each time T comes up. What can I say about my 
fortune after n tosses? Using our dictionary of § 4, 
Chapter 1 , we can represent this fortune by 

(1.1) n (t) + r 2 {t) H f r n {t). 

The question of obvious interest to the player is what are 
his chances that, after n tosses, his fortune exceeds a 
prescribed number A n . Again by our dictionary, this is 
equivalent to asking for the measure of the set of t’s for 
which 

(1.2) ri (£) + r 2 (t) + • • • + T n (t) > A n . 

If it is indeed unlikely that I shall get rich by playing this 

13 
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game, then if A n is “sufficiently large” the measure of the 
set defined by (1.2) should be “small.” (Similarly, it 
should also be unlikely to lose more than A n .) We make 
all this precise by proving the following theorem: 

For every e > 0, 

(1.3) lim jit{ |ri(0 d \- r n (t ) | > m\ - 0. 

n— 

An obvious attack can be based on formula (5.6) of 
Chapter 1. In fact, we have 

/x{ |ri(0 d b r n (t) | > en} 

= 23 m(^i( 0 d~ • • • d - r n (t) = 2 1 — n) 

|2i — n| >€n 

= s ^(*). 

\2l-n\>tn 2” \l/ 

and all we have to prove is that, for every e > 0, 

(1.4) lim £ in -a 

n— 1 21 — n| >en 2 n \ l / 

Try it ! It is not hard but not very easy either if you follow 
the easy inclination and use Stirling’s formula. If you suc- 
ceed, you will have essentially rediscovered the. original 
proof of Bernoulli. But there is an easier and a better 
way due to Tchebysheff. 

You simply write 

(1.5) f (ri(t) d f r n (t )) 2 dt 

do 

>f (ri(t) d h r n (t)) 2 dt 

J I n (0 + • * • +**»( 0 1 > 

> e 2 n 2 n{ |ri(<) d h ^n(0 1 > eft}. 
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If you have worked Problem 3 at the end of Chapter 1, 
you will get 

(1.6) ^ (ri(£) H h r n (t)) 2 dt = n 

Jo 

and hence, using (1.5), 

(1.7) /*{ki(0 H pr«(0|> en) < — * 

<rn 

which proves (1.3) with “plenty to spare.” 

Remember this neat device of Tchebysheff; we’ll meet 
it again ! 

The statement (1.3) embodies the simplest example of 
what is technically known as “the weak law of large 
numbers.” The adjective “weak” is not meant to be 
derogatory and is used to distinguish it from another law 
of large numbers, referred to usually as the “the strong 
law.” “Strong” is not meant to be laudatory except that 
for the game of “heads or tails” it implies the “weak law” 
and is therefore stronger in the logical sense. 

Both laws have been vastly generalized, and in their 
ultimate forms neither implies the other. These are, how- 
ever, technical questions which will not concern us here. 
The mathematical content of the weak law of large num- 
bers is relatively meager. In the form (1.4) it is an amus- 
ing theorem about binomial coefficients. Could this then 
be a formulation of the mysterious “law of averages” re- 
ferred to above? I am afraid so. This is essentially all we 
can hope for from a purely mathematical theory. 

2. Borel and “normal numbers.” Another law of 
large numbers was found by Borel. Borel proved that for 
almost every t (i.e., for all t’s except a set of Lebesgue 
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measure 0) one has 


( 2 . 1 ) 


n(t) + r.Jt) H 1- r n (t) 

lim = 0. 


n— »» 


n 


The proof is easy and is based on a well-known theorem 
from the theory of Lebesgue measure and integration. 
The theorem in question is as follows: 

If {/»(£)} is a sequence of non-negative Lebesgue inte- 
grate functions, then convergence of 


( 2 . 2 ) 


» /*i 

E I fn 

n = 1 ^0 


( t ) dt 


implies convergence almost everywhere of the series 


(2.3) 

Set 

(2.4) 

and consider 


/.w = ( : 


00 

£/»(<). 

71=1 

ri(0 4 1- r n (0\ 4 


n 


1 Ai (£) H b r n (0 



n 



Using the result of Problem 3 at the end of Chapter 1, we 
readily calculate that 

4! 

71 4- “ 

1 Ai(0 H b r n (0\ 4 2!2 



n 


i\* 2!2 

) * 4 

/ n 


1! /n\ 
!2! \2/ 


and hence 


It follows that 


00 z* 1 

E I /» 

n=l •'0 


(0 dt < oo. 



f*i (0 H b r n (0\ 4 


n 



BOREL AND AFTER 


17 


converges almost everywhere, and a fortiori 


lim 

n — *<*> 



rdt) d b r n (t) 

n 




almost everywhere. This proves (2.1). 

If we recall that 

n(t) = 1 - 2 € k (t), 


then (2.1) is equivalent to saying that, for almost every t, 



lim 

n— *oo 


€ l(0 + * ' ' + «n(0 


n 


1 

2 


In other words, almost every number t has (asymptoti- 
cally!) the same number of zeros and ones in its binary 
expansion! This is the arithmetical content of B Orel’s 
theorem. What does the theorem say probabilistically? 
Using our dictionary, we arrive at the following statement: 
If a “fair” coin is tossed indefinitely and if the tosses are 
independent, then with probability 1 the frequency with 
which heads (tails) appear is \ (in the limit, of course). 
This statement satisfies our intuitive feeling of what a 
“law of averages” ought to say and reassures us as to the 
validity of our dictionary. 

The reader is undoubtedly aware that there is nothing 
sacred about the base 2. 

If g is an integer greater than 1, we can write 




g 



0 < t < 1, 


where each digit w(t) can now assume the values 0, 1, • • •, 
g — 1. We leave it to the reader to prove that for almost 
every 2(0 < t < 1) 

(2.7) lim 

n— 71 Q 
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where Fn\t) denotes the number of times the digit k, 
0 < k < g — 1, occurs among the first n w’s. (This is 
Problem 1 on page 18.) 

From the fact that a denumerable union of sets of 
measure 0 is of measure 0, it follows that almost every 
number t,0<t< 1, is such that in every system of nota- 
tion (i.e., for every g > 1) each allowable digit appears 
with proper (and just!) frequency. In other words, almost 
every number is “normal”! 

As is often the case, it is much easier to prove that an 
overwhelming majority of objects possess a certain 
property than to exhibit even one such object. The present 
case is no exception. It is quite difficult to exhibit a 
“normal” number! The simplest example is the number 
(written in decimal notation) 

0.123456789101112131415161718192021- • •, 

where after the decimal point we write out all positive 
integers in succession. The proof that this number is 
normal is by no means trivial. 


PROBLEMS 


1. Prove (2.7) by first proving that the w’s are independent and 
then generalizing the result of Problem 3 of Chapter 1. 

2. Let f(t), 0 < t < 1, be a continuous function. Prove that 


lim 

n— *oo 



X\ + * * ‘ + %n 

n 



• * dx n = /( 



Hint: First prove, imitating Tchebysheffs proof of (1.4), that the 
n-dimensional volume of the set defined by the inequalities 


xi . — |- x n 
n 



0 < x; < 1, 



is less than l/12c 2 n. 
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3. The “ unfair ” coin. Let T p (t), 0 < p < 1, be defined as follows 


and let 




V 



p < t < 1, 


Plot the functions 



0<<<p 
p < t < 1 


4 p) (o - <„(o, 4 p) (o = cpdwo), 4 P) «) = e P (T p (T P m, • • • 

and show that they are independent. Note that, if p — one 
obtains one minus the binary digits. 

4. Prove that the measure of the set on which 

«f>(0 H h ” If 0<l<n 

is equal to 

(") v \ i - p) n ~ l - 

5. Explain how the functions e„(t) can be used to construct a 
model for independent tosses of an “unfair” coin, where the proba- 
bility of H is p and the probability of T is q = 1 — p. 

6. Show that if /(<) is continuous then 



e (p)(0 +...+ e (p)(i) 


n 


) *= Mi) 0 


p k a-p) n ~ k =B n (p). 


[The B n (p) are the famous Bernstein polynomials.] 

7. Using Tchebysheff’s “trick” estimate the measure of the set on 
which 



(0 ~l h 

n 




> e 


and prove that 


lim B n (p) =f(p ) 


imiformly in 0 < p < 1 [define B n ( 0) = /( 0) and B n ( 1) = /(l)]. 
(This is the original proof of S. Bernstein of the famed theorem of 
Weierstrass on approximation of continuous functions by poly- 
nomials.) 
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8. Let f(t) satisfy the Lipschitz condition of order 1; i.e., 

\m - m I < M\h - t 2 \, 0 < t h h < 1, 
where M is a constant independent of h and h. Prove that 

I f(p) ~ B n (v ) I < 17 — 7 =- 

2 v n 

9. Let 

f(t) = |< — i I, 0<t<l. 

and note that it satisfies the Lipschitz condition of order 1. Use 
the result of Problem 7 of Chapter 1 to estimate from below 

\f(h) -*«(*) I, 

and thus show that the order 1 /\/n in the estimate of Problem 8 
above is the best possible. 

10. Prove that for almost every t 

«!'’«)+•••+ ^’(o 

lim = V. 


11. Show that there exists an increasing function <t> p (t) such that 

4 P) (0 = « )), k = 1 , 2 , 

(eft’s are the binary digits). Show further that for p \ the function 
<t> p (t) is “singular”; i.e., every set E of positive measure contains a 
subset Ei differing from E by a set of measure 0 and such that the 
image <t> P (E{) is of measure 0. [See Z. Lomnicki and S. Ulam, Fund. 
Math. 23 (1934), 237-278, in particular pp. 268-269.] 

12. Show that for every e > 0 the series 

Ir 1 ;?s e * p w,) +•••+'”«!} 


converges almost everywhere and that consequently 


|n(0 d b r n (t) I 

lim sup . = . . — 

n-><» \/ n log n 


< \/2 


almost everywhere. Hint: Note that (£ real) 

J '^lnW+'-’+rnWl dt < I e«<»-i«> + •••+'»(*)) dt 
n •'o 





• • 


+r„(0) dt = 2 (cosh £) B . 
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Note. The result that 


,. I n(t) + • — hr n (0| 

lim sup 

»->«> V n log n 


< V2 


was first obtained by Hardy and Littlewood in 1914 in a rather 
complicated way. A much stronger result to the effect that 


ki(0 4 h r„(t ) | 

lun sup = — 

n _»oo v n log log n 



almost everjrwhere was proved in 1922 by Khintchin. This is con- 
siderably more difficult to prove. 


3. “Heads or Tails” — a more abstract formula- 
tion. A universally accepted pattern of statistical theories 
(i.e., theories based on the notion of probability) can be 
briefly summarized as follows: 

One starts with a set £2 (“sample space”) whose measure 
(probability) is assumed to be 1 . In £2 there is a collection 
of subsets (“elementary sets” or “elementary events”) 
whose measures (probabilities) are given in advance. 
The problem is to “extend” this measure to as wide a col- 
lection of subsets of £2 as possible. 

The rules for extending are the following: 

1°. If Ai, A 2 , • • • are disjoint (mutually exclusive) sub- 
sets of £2 (events) and if they are measurable (i.e., can be 

00 

assigned a measure), then their union [J Ak is also meas- 
urable, and * =1 


r 00 

IU* 

fc=i 


% 


fe=i 


where n{ } is the measure assigned to the set in braces. 

2°. If A is measurable, then so is its complement £2 — A. 
(It follows from 1° and 2° that n{tt — A) = 1 — n{A} 
and, in particular, since £2 is measurable by postulation, 
that the measure of the empty set is zero.) 
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3°. A subset of a set of measure zero is measurable. 

Measurable functions /( co), weft, defined on £2 are 
called “random variables” (a horrible and misleading 
terminology, now, unfortunately, irretrievably entrenched). 
Let us see how “heads or tails” fits into this scheme. 

The sample space £2 is simply the set of all infinite 
sequences of symbols H and T, i.e., sequences like 

w : HTHHTTT- • •. 

What are the elementary events? Customarily they are 
the “cylinder sets,” i.e., sets of sequences in which a finite 
number of specified places is held fixed. For instance, the 
set of sequences whose third element is H, seventh T, and 
eleventh T is a cylinder set. What measures are to be as- 
signed to these cylinder sets? This depends, of course, on 
the nonmathematical assumptions about coin tossing 
which we must translate into mathematical language. 
Independent tosses of a “fair coin” are translated into this 
language by assigning to each cylinder set the measure 

k 


where k is the number of specified places held fixed. There 
is now the important problem of proving uniqueness of the 
extended measure. In our case, this can be done very 
simply by appealing to the uniqueness of Lebesgue’s 
measure. This states that if a measure n defined on (0, 1) 
satisfies 1°, 2°, and 3° and if the /x-measure of every interval 
is equal to its length, then u is the ordinary Lebesgue 
measure. If we write 1 for H and 0 for T, then to each se- 
quence of symbols H and T there corresponds (uniquely 
except for a denumerable set of dyadic rationals) a num- 
ber t, 0 < t < 1, namely, the number whose binary digits 
are given by the H’s and T’s of the sequence after they are 




BOREL AND AFTER 


23 


replaced by ones and zeros. This mapping also has the 
property that it maps cylinder sets into unions of disjoint 
intervals whose end points are dyadic rationals, and, more- 
over, the measure we have assigned to the cylinder sets is 
equal to the Lebesgue measure (length) of the set into 
which it is mapped. Now, we are through! 

The uniqueness of extension can be also proved without 
an appeal to mapping. The most general theorem of this 
kind was proved by Kolmogoroff in his 1933 book Grund- 
begiffe der Wahrscheinlichkeitsrechnung. 

Once a measure on ft has been firmly established, one can 
in a standard way, construct a theory of integration which 
parallels the usual Lebesgue theory. 

Let « c ft, i.e., co is a sequence of symbols H and T. 

Set 


X»(«) 


f + 1, if the kth. element of co is H, 
1 — 1, if the kth element of co is T. 


The functions Xk( co) are “independent random variables” 
in the sense that 

(3.1) Jz{Xi(co) = 5i, X 2 (co) = $ 2 , • • •, Xn(co) = 5 n ) 

= — = n M{x fc ( co) = 5fcj 

* k= 1 

for every sequence of 5y, where each 8 is either 1 or — 1 . It 
is clear that the X*(co) furnish us with a model of inde- 
pendent tosses of a “fair” coin. 


4. What price abstraction? To abstract is presuma- 
bly to come down to essentials. It is to free oneself from 
accidental features and to focus one’s attention on the 
crucial ones. Abstractly, the theory of “heads or tails” 
(“fair” coin, independent tosses) is simply the study of 
functions Xk(oi) having property (3.1) defined on some 
space ft (of measure 1) in which there is given a measure n 
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satisfying 1°, 2°, and 3° of the preceding section. It is 
immaterial what ft is, and one is allowed to use only (3.1) 
and the rudimentary properties of 1°, 2°, and 3° of the 
measure. One must, of course, convince oneself that one 
is not in a mathematical vacuum, i.e., that the objects we 
are talking about can be defined. This is accomplished by 
taking Q, to be the “sample space” and by constructing the 
required measure n, as has been indicated in § 3. The 
fact that a realization of the Xk(u) is given by the Rade- 
macher functions r* (t ) , i.e., that we can take for Q, the 
interval (0, 1) with the ordinary Lebesgue measure, can be 
considered as accidental. Note, that with the exception 
of an amusing proof of Vieta’s formula in which we have 
used a very special property of the Rademacher functions, 
namely, that 


00 


1 - 2t = S 

k=l 


w) 

2 fc 


I 


we have never appealed to anything but the property 
(3.1) and the general properties of measure. But the price 
one may be called upon to pay for unrestrained abstraction 
is greater, much greater in fact. For unrestrained abstrac- 
tion tends also to divert attention from whole areas of 
application whose very discovery depends on features that 
the abstract point of view rules out as being accidental. 
Illustrations of this point are scattered throughout the 
book. Let me begin by giving a few examples from the 
realm already familiar to us. 

5. Example 1. Convergence of series with random 
signs. What is the probability that the series 

00 

2 (cfc real), 

fc=i 

with signs chosen independently and each with probability 



BOREL AND AFTER 


25 


converges? This problem was first posed in this form 
by H. Steinhaus in 1922 (and independently by N. 
Wiener) to whom we also owe the essence of § 3. Steinhaus 
noted that the problem is equivalent to finding the measure 
of the set of t’s for which the series 

00 

(5.1) ^2c k r k (t) 

l 

converges. This question had at that time been already 
answered by Rademacher who proved that, if 

00 

(5.2) 2 °k 2 < °°j 

i 

the series (5.1) converges almost everywhere. We could, 
of course, consider the convergence of 

00 

(5.3) £ f*W, 

fc=l 

where the X k (ui) have the property (3.1). Indeed, the 
proof of Kolmogoroff who has found the ultimate generali- 
zation of Rademacher’s theorem used only (3.1). There 
is, however, a beautiful proof due to Paley and Zygmund 
which makes an essential use of Rademacher functions. 
It is this proof that we shall reproduce here for reasons 
which will become apparent a little later. The proof is 
based on two not quite elementary but very important 
theorems : 

1. The Riesz-Fischer theorem, which states that if 

'Ea k 2 < oo 

and if <f>i(t), 0 2 (O, • • • are orthonormal in a set E, i.e., 

E 


(5.4) 
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then there exists a function fit ) e L 2 
such that 


(i.e., f / 2 (f) < oo) 

Je 


(5.5) lim I (f(t) — 2 = 0. 

u-koJe k=l 

2. The fundamental theorem of calculus, which in its 
“advanced” version states that if 

(5.6) f | f(t) | dt < oo, 

•'o 

then, for almost every to, 




dt = f(to ) 


provided that 

ot m < t 0 < (3 m and lim a m = lim /3 m = t 0 . 

m— 


Now, we know that the Rademacher functions are ortho- 
normal on (0, 1) 



Consequently (by the Riesz-Fischer theorem stated above) 
there exists a function f(t) such that 


(5.8) 
and 

(5.9) 



dt < oo 
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(Recall that we assume 


00 


2 c k 2 < ».) 

1 

Now let to be such that (5.7) holds [(5.8) implies (5.6) !], 
and let 

. , - . k m km + 1 

(5.10) Ot m = — < t 0 < — — = P m 

nm owi 

(we exclude the possibility that t 0 is a dyadic rational) . We 
have 


pPm n 

f W)-Z 


CkTk(t)) dt 


< (Pm ~ (fW ~ S Ckfkit)^ d^j 


Yi 


and hence by (5.9) 


J r*Pm 00 pPm 

f(t) dt = 2 c fc I r k (t) dt. 
a m 1 J<hn 


(5.11) 


Observe now that 


(5.12) 

and 


pPm 

I rk(t) dt = 0, k > m 
da m 


A 


(5.13) I Tk(f) dt = (Pm. &m)rk(t o)» k ^ HI. 

Ja m 

Thus (5.11) becomes 


P 


m 


1 pPm m 

I f(t) dt = ^ Wkito) 

** a m 1 
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and hence by (5.7) 

00 

22 WicHo) 

i 

converges. 

The above argument can be extended immediately to 
proving that the series 

00 

(5.14) 22 Ck sin 2x2** 

fc=i 

converges almost everywhere if 

(5.15) Sc* 2 < oo. 

This theorem suggests itself naturally if one notices that 

r*(*) = sgn sin 2x2* _1 *. 

In fact, our proof hinged on three properties of the Rade- 
macher functions : 

1° Orthonormality 

2° (5.12) 

3° (5.13) 

Of these 1° and 2° are satisfied when r*(*) is replaced by 
sin 2x2**. Property 3° is not strictly satisfied, but we have, 
for k < m, 


(5.16) 


and 


J r*@m 

sin 2x2*< dt = (fi m — a m ) sin 2x2*<o 

a m 



2x2** — sin 2x2** 0 ) dt 



2x2** — sin 2x2** 0 ) dt 
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< | | sin 2 tt2 H - s in 2w2%\dt < 2x2* [* \t 

•/am 


pPm 

f |i_ 

Ja m 


to \dt 


^ 2x2* (/3 m — 2x (Pm <*»»)• 

0771 


Now, instead of 


0 


m 


1 /»0TO »» 

I /(O dt = Yj Wkito), 

— % da m l 


we get 


/* 


771 


1 pPm m 

I /(O dt — ^2 Ck sin 2ir2 k to 

a m 1 


m o 

< El<* 

1 


771 — ft 


and since c n — » 0 as n — > « (remember that hc n 2 < «!), 
one has 

2x 


m 


r-0, 


m— o 1 


and this is sufficient to complete the proof. 

The theorem we have just proved concerning conver- 
gence of 

00 

y, Ck sin 2ir2 k t 

l 

is actually a special case of a famous theorem of Kolmo- 
goroff to the effect that 


S Cfc 2 < qo 

i 

implies convergence almost everywhere of 


y cjt sin 2mkt 

k = 1 
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provided that there exists a number q such that 


n k+ i 

n k 


>q> 1. 


Kolmogoroff’s proof used, in an essential way, the fact 
that the series in question were trigonometric series, but, 
by an extension of the Paley-Zygmund argument, one can 
prove the following much more general theorem: 

If g{t) is periodic with period 1 and if 

(a) f g(t) dt = 0 

do 

(b) W) - gif) | < M\t' - f\ a , 0 < a < 1, 


then convergence of Sc*, 2 implies convergence almost every- 
where of 

00 

2 c k g(n k t ) 

1 


provided that the integers n k are such that 


njc+i 

n k 


>q> 



The proof of this statement is a little too technical to be 
reproduced here — though no essentially new idea beyond 
that of Paley and Zygmund is needed. 

What is the moral of all this? The seemingly accidental 
fact that 

r k (t) = sgn sin 2t2 k ~ 1 t 

suggests that there may be analogies between r k (t ) and 
sin 2ir2 k ~ 1 t. Since the r k (t) have a definite probabilistic 
interpretation, a way is opened to connect “heads or tails” 
with a mathematical realm unrelated with chance, proba- 
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bility, coins, and what have you. Could this be achieved 
if we had insisted on treating *' ‘heads or tails” abstractly? 
Perhaps, but I doubt it. 

6. Example 2. Divergence of series with random 
signs. What happens to the series 

00 

k= 1 

EC * 2 = oo? 

1 

The answer is now that (6.1) diverges with probability 1. 
The proof is quite simple. First, we note that our problem 
is simply to determine the measure of the set of con- 
vergence of 

00 

(6.3) 2 c k r k (t) 

l 

under the condition (6.2). Next, we note that the set of 
convergence of (6.3) must be either of measure 0 or measure 
1 (a special case of the so-called zero-one law). Recall 
that 

n(t) = r x (2 k ~H)* 

and hence if t is in the set of convergence then so is 


1 



for l = 0, 1, 2, • • •. 

In fact, if t is replaced by t + 2~ 1 only a finite number 
of terms of (6.3) are changed, and this cannot affect con- 
vergence. Thus the characteristic function of the set of 
convergence has arbitrarily small periods, and by a well- 

* It should be understood that r*(<) is defined in such a way that 
it is periodic with period 1. In other words ric(t 4- 1) = r*(0- 


( 6 . 1 ) 

if 

( 6 . 2 ) 
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known theorem it must be a constant almost everywhere — 
the constant being either 0 or 1.* 

We can assume that c n — » 0, for otherwise the statement 
of our theorem would be trivial. 

Suppose now that (6.2) holds, c„ — > 0, and the series 
(6.3) converges on a set of positive measure. By the re- 
mark above it must converge almost everywhere. Hence, 
there exists a measurable function g(t) such that 


n 

(6.4) lim X Whit) = g{t) 

n— ►« i 

almost everywhere. From (6.4) it follows that, for every 
real £ 0, 

n 

lim exp [t£ ^ c*rfc(<)] = e^ g(<) 


almost everywhere. By Lebesgue’s theorem on bounded 
convergence we conclude that 


* For a bounded, measurable (hence Lebesgue integrable!) func- 
tion the proof is as follows: We have 



J »1 2 ! -l - 

f m dt = d f 

0 fc =0 


M+ 1/2 

'k/2 1 


I 

<t>(t ) dt = 


k+l/2 l 


J '*+: 
k/2 l 


<t>(t) dt. 


Let to be such that 

, rki+ l/2 l 

lim 2 l I , 4>{t) dt = 4>(to) 

l — Jki/2 l 

for ki/2 l < to < (ki + 1)/2 Z . From the fundamental theorem of cal- 
culus (see § 5) almost every <0 has this property. Thus <t>(to ) = I 
for almost every to. If 4>{t) is not assumed bounded, apply the above 
argument to This proof is due to Hartman and Kirshner; the 

theorem was first proved in a more complicated way by Burstin. 
That the characteristic function of the set of convergence is measura- 
ble is clear since the set of convergence of a series of measurable func- 
tions is measurable. 
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A* ^ /*! 

( 6 . 5 ) lim I exp [i£ ^ Cfcrjb( 0 ] dt = I eft. 

n— ►»%/Q J «/Q 


But we know that 



n 



(6.6) | exp [if 2 c*rfc(0] eft = II cos 

0 1 k=l 


and we leave it to the reader to prove that (6.2) and c n — > 0 
imply 

n 

lim H cos £cfc = 0. 

Thus, 

( 6 . 7 ) f e* gW dt = 0 

•'O 


for every real £ 5^ 0. 

Now take a sequence £„ — > 0 , but make sure that each 
£ n 5^ 0 (e.g. £ n = n -1 ) ; we have 

lim £ n g(t) = 0 


for almost every t and hence 

lim e^» g (<) = 1 

n— 

for almost every t. 

Again by Lebesgue’s theorem on dominated convergence 

lim 

n-+ oc 

which implies 0 = 1, a contradiction. Hence ( 6 . 3 ) could 
not converge on a set of positive measure. Hence it must 
diverge almost everywhere. 

This method of proof utilizes independence of the rk(t ) 


f. 


dt = 1 
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in an essential way [see (6.6)] and does not seem immedi- 
ately applicable to studying the series 

* njt+i 

Ck sin 2ir nkt, > q > 1 

fc=l 


under the condition 

2 c k 2 = oo. 

i 

Actually, the method can still be adapted, but we postpone 
the discussion of this point until later. 


PROBLEMS 


00 

1. Let 23 Ck 2 = 00 , Ck — > 0 and consider the series 

i 

00 

T! ck sin 27r2* _1 <. 

*= i 


(a) Prove that 



exists and find its value. 

(6) Prove that if F n (t), 0 < t < 1, is a sequence of functions such 
that 


lim 

n— 





lim 

n— >oo 



dt = (3 


then the measure of the set E on which F n (t) approaches 0 cannot 
exceed 
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(c) Using (a) and (6), prove that under the conditions of the 
problem the series 

QO 

23 Ck sin 2 tt 2 k ~H 

l 

diverges almost everywhere. 

2. The following example shows that sine in the theorem of 
Problem 1 cannot be replaced by an “arbitrary” periodic function 

f(t) of period 1 (subject, of course, to the condition | f(t) dt = 0) 

Jo 

Let 

/(<) = sin 2wt — sin 4ir t\ 

show that 


converges everywhere. 
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CHAPTER 3 


THE NORMAL LAW 


1. De Moivre. In § 1 of Chapter 2, we discussed the 
“weak law of large numbers.” A more precise result was 
proved by De Moivre to the effect that 




n— ►<*> 




The reader will have no trouble interpreting this result in 
probability terms. An elementary proof can be based on 
formula (5.6) of Chapter 1, and (1.1) becomes equivalent 
to the purely combinatorial formula 


(1.2) lim 


— I 

2 2 





Adroit use of Stirling’s formula will yield (1.2), but this 
proof will also obscure the nature of the theorem. At- 
tempts to generalize (1.1) provided one of the strongest 
motivations for developing analytical tools of probability 
theory. A powerful method was proposed by Markoff, but 
he was unable to make it rigorous. Some twenty years 
later, the method was justified by Paul Levy. The next 
two sections are devoted to Markoff’s method. 
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2. The idea. Let 




«1 < X < 0 > 2 , 

otherwise. 


From the elementary theory of Fourier integrals, 
knows that 



one 


with the usual proviso that for x = coi and re = co 2 one 
gets Now unless «i and « 2 are integral multiples of 
\/n one has 





f ri(0 d h r n {t) 

i coi < 7= < w 2 

l V n 


/»! 1 

v o »/ __oo ^ 


X exp 


/ _ ri(Q H h r n (Q\ 

\ 'Vn ) 


d£ dt. 


Interchanging the order of integration [easily justified in 
our case since r\{t) H — • + r n (t) assumes only a finite 
number of values] we get 


( 2 . 4 ) n 


^ T\ (t) H b r n (t) ^ 

COi < 7= < C0 2 


V 


n 


1 /»* _ g iw if 

2tt J-a 



38 


STATISTICAL INDEPENDENCE 




(Q H 1 -jrjf) 

"\/ n 




1 /» 00 e i “2f _ 

2ir J—oo i( 



Now, for every real £, 




e 





and it is tempting to conclude that 



lim n 

n— 


f r x (t) H h r n (t) 

«i < y= < w 2 

[ V n 


1 r™ e iw * f - 
2x J_oo 




What is the trouble with this method? The only step 
which needs justification is the interchange of the opera- 
tions of integration and taking the limit n — > ». Un- 
fortunately, the limits of integration are — °o and + 00 , and 
the function 


$ 

is not absolutely integrable. 

Markoff, who was a superb mathematician, was unable 
to overcome this difficulty, and he abandoned the method ! 

The physicists, whose concept of rigor is less strict than 
ours, still call the method the “Markoff method/’ whereas 
mathematicians are hardly aware of its origin. 

3. Markoff’s method made rigorous. The justifica- 
tion of Markoff’s method is actually quite easy. It is 
based on a simple idea of wide applicability. 
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First, let us examine formula (2.2). It is simply Fou- 
rier’s formula 

( 3 . 1 ) g(x) = — f f g(y)e^ (v ~ x) dy d% 

27 T «/ QQ «/ QO 

applied to the special function (2.1). 

Introduce now two auxiliary functions, g e + (x) and g~{x), 
whose graphs * are shown below (e > 0, 2e < <*>2 — wi). 




We have 

( 3 . 2 ) gr{x) < g(x) < g t + (x) 

and consequently 

• • + x n (t) 


( 3 . 3 ) f 

•'o 


1 _ Ai(0 + 

g « 


V 


dt 


n 


Now 


. ri(t) d h r n (t) 

> /i I coi < 7= < C0 2 

V n 


< f 1 + / ri(Q -1 h r n (Q 

~Jo * V Vn 



0 = f g e (y)e iv *dy and G t + (|) = f g t + (y)e wi d£ 

J — 00 — OO 

are absolutely integrable functions of £ in (—00,00), and 
because of this the argument of § 2 yields rigorously 


* The heights of both graphs are equal to 1. 
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(3.4) lim 

n— >oo J o \ Vn / 


and 


(3.5) 


lim f g + ( 

n — ►»«/q \ 


= J- f e {2/2 r (?/)e if!/ dy d£ 

A/7T */ —oo — 00 

= vt Sj r(v) ^' n dy 

ri(t) H b r n (t)\ 

vs — ) dt 

1 *00 ^oo 

= — I e -{2/2 f fir« + (y)e tfj/ d*/ d£ 

2i*R v —oQ v — oo 

= ^ Lv ,,)e ^ n d *- 


Combining (3.4) and (3.5) with (3.3), we get 

(3 - 6) vrjy^ 112 dy 

f ri(<) b r n (t) 

< lim inf M «i < — 7 = — < « 2 

“ n— ►» [ Vn 


... [ J’iO) H b r n (t) 

< lim sup n j wi < -7= < « 2 

n *-♦«> l 71 


- Vli dv - 
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Since (3.6) is valid for every e > 0, we obtain at once 


/om v f ^ r i CO H \-r n (t) 

(3.7) Inn m j coi < - t = < W 2 

n-»« l Vri 


l r' J 

= V2i Jj (y) 


, — V 2 I2 


dy 


1 r" 

'\Z2tt J U1 


r-V 2 !% 


dy. 


PROBLEMS 


1. In 1917, the late H. Weyl proved that for every irrational a the 
sequence a n = na — [na], n = 1, 2, • • •, is equidistributed in (0, 1). 
In other words, if 0 < «i < a >2 < 1 and Ar n (a>i, < 02 ) denotes the number 
of a/s, 1 < j < n, which fall in («i, C 02 ) then 

,. k n (o> i, W2) 

lim = <02 — o>i. 

n — n 


Introducing the function g(x), periodic with period 1, given by (2.1) 
in (0, 1) and using Fourier series instead of Fourier integrals, prove 
Weyl’s theorem. 

2. Use Markoff’s method to prove Laplace’s formula 
lime- E £- * 

x — >00 X + unVx<k<X + 0>2>/x k\ \Z 2 ir Joil 


4. A closer look at the method. An inspection of 
the derivation of § 3 reveals that we have really proved 
the following theorem : 

Let f n (t), 0 < t < 1, be & sequence of measurable func- 
tions such that for every real £ 

(4.1) lim dt = e ~ * 2/2 . 

n— >00 J q 

Then 

(4.2) lim M («, < /„(() < co 2 ] = f ’e^' 2 dy. 

n— >oo V 2ir Jo, 1 
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Let 

(4.3) <7 n (co) = n{fn(t) < co} , 


then <r n (co) has the following properties: 


1°. o-„(— 1 ») = 0 , o- n (+oo) = 1 . 
2°. <r n (co) is nondecreasing. 

3°. <r n (co) is left-continuous. 


(Note that property 3° is a consequence of complete addi- 
tivity of Lebesgue’s measure.) A function <r(co) having 
properties 1°, 2°, and 3° is called a distribution function. 
Now 




i 

dt 



and our theorem can also be stated as follows: 

If a sequence of distribution functions a n (co) is such that 
for every real £ 



then 


(4.6) 
where 

(4.7) 


On(u2) ~ Cn(« i) — > G{ C0 2 ) ~ G(w i), 



An attentive reader will notice a slight logical gap. If 
we are simply given a sequence of distribution functions 
<r n (co) , the last formulation follows from the preceding 
one only if we can exhibit a sequence of functions f n (t), 
0 < t < 1 , such that 

(4.8) u{fn(t) < co} = <T n (u). 

One can circumvent this step by repeating, in essence, the 
argument of § 3. But the construction of the functions 
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f n (t ) is exceedingly simple. In fact, we can simply take for 
f n (t) the inverse of <r n (co), with the understanding that the 
intervals of constancy of <x n (u>) are reflected in discon- 
tinuities of f n (t) and discontinuities of <r n (a>) in intervals of 
constancy of /„(£). We leave the details to the reader. 
The conclusion that (4.5) implies (4.6) is a special case of 
an important general theorem known as the continuity 
theorem for Fourier-Stieltjes transforms. This theorem 
can be stated as follows: If <r n (co) is a sequence of distribu- 
tion functions such that for every real £ 


(4.9) 


lim T d<r n (co) = c(£) 

n— o J — ^ 


and if c(£) is continuous for £ = 0, there exists a unique 
distribution function <r(co) such that 


(4.10) 

and 

(4.11) 


00 

f e^ u d<r(co) = c(£) 
j — 00 


lim o'n(w) = <r(co) 


n— 


for every co for which cr(a>) is continuous. 

The proof, in addition to ideas already explained, makes 
use of the so-called Helly selection principle and is a little 
too technical to be presented here. We consequently omit 
it though we shall feel free to use the theorem in the sequel. 


PROBLEMS 


1. Let / n (0> 0 < t < 1, be such that for k *= 0, 1, 2, • • • we have 

rO, k odd 

lim f f n k (t) dt = — f y k e~ y212 dy 
n— >« J o V * 7r •'—oo 


k\ 


2 m 


G)> 


k even 
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Prove that for every real £ 

lim f e'* f n w dt = e - * 2 / 2 
n— >oo Jo 

and that consequently (4.2) holds. 

2. Let {iim,} be a sequence of integers such that 


Wm-fl 

lim 

m — Tim 


Prove that for k — 0, 1, 2, 
lim fYV 2 

m— >«o Jo \ 


cos 2imit + cos 2 irn 2 < 4 f- cos 2im m t\ k 

— 1 dt 


V 


m 


and hence 


1 r 00 

" V T* J-J 


V 2 /2 


dy 


im n I 


lim 

m 


0)1 < 


Vi 


cos 2imit + cos 27rn2< + • • • + cos 2 m m t 


V 


w 


< W2 


} 


- 1 p 

\/ 2ir Jui 


> — 1/ 2 /2 


dy. 


Note: By the same method but utilizing trickier combinatorial 
arguments one can prove that, if 


00 


and if 


23 cjt 2 = oo and | Ck \ < M 

l 


then 




> q > 1, 


lim 

n— >oo 


M 


n 


23 c fc cos 2-KUkt 

0)1 < V 2 < 0)2 

/ n 



_ 1 p 

V 2 tT 


e -y2/2 dy. 


00 


In particular, it follows that 23 c * 2 = 00 implies divergence almost 


oo 


everywhere of ^2 c k cos 2m%kt (the argument is, of course, applicable 

1 

if one replaces cosine by sine). 
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As the reader can see this is closely related to the method used in 
Example 2 of § 5 Chapter 2. 

3. Let (t(gj) be a distribution function, and let 

-00 

c(£) =J e iiu d<r( U ). 

^ — 00 

Prove that 


1 r T 

lim — I |c(£) | 2 d{ = sum of the squares of the jumpB of a(«). 
T—> « T J o 

(This simple but beautiful theorem is due to N. Wiener.) 

' (A proof can be based on noting that 

c(£) = f dt, 

Jn 


where /(f) is the inverse of <r(w) as described above. Thus 

\T /»1 /»1 i /%T 


and 


1 I* 1 1 1 M 

^ f |c({)| 2 di - f f - J <*(«•> -/<*»#** 

T J o Jo Jo i Jo 

r-mT J 0 11, /(f) = /(s). 


By the theorem on bounded convergence, it follows that 

.r 


1 J 

lim- f |c(£)| 2 d£ 

t — ► 00 T Jn 


exists and is equal to the plane measure of the points (f, s), (0 < f, 
s < 1) for which /(f) = /(s). This is equivalent to our theorem.) 

4. Prove that 


/(f) = X) Cfcnt(f), X) c* 2 < °°, 
fc=i i 

cannot be constant on a set of positive measure unless all but a 
finite number of c’s are equal to 0. 

5. A law of nature or a mathematical theorem? 

To conclude this chapter, we shall consider an example 
which conceptually and technically is quite instructive. 
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First, we need three definitions. 

1°. The relative measure. Let A be a set of real ^umbers, 
and consider the subset of A which lies in ( — T, T), i.e., 
A fl (-T, T). The relative measure me{A} of A is de- 
fined as the limit 

(5.1) „ b \A} = lim fl (-T, T ) |, 

T-*<e 2 T 


if the limit exists. The relative measure is not completely 
additive, for if A; = (i, i + 1), i «= 0, ±1, ±2, • • then 


while 



23 me {A;} = 0. 


2°. The mean value of a function. The mean value 
M{f(t)\ of the function f{t), —<x> < t <», is defined as 
the limit 

1 ^ 

(5.2) U (/(()) -lim— f mdt, 

T~^00 21 J rp 

if the limit exists. 

3°. Linear independence of real numbers. Real numbers 
Xi, X 2 , • • • are called linearly independent (or independent 
over the field of rationals) if the only solution ( >^ 2 , • * 0 
in integers of the equation 


(5.3) & 1 X 1 -j- /C 2 X 2 -}-••• — 0 


is 


k\ — k>2 — /C 3 — • • • — 0 . 


The most famous example of linearly independent numbers 
is the sequence 


(5.4) 


log pi, log Pi, log Pz, 



THE NORMAL LAW 


47 


of logarithms of primes (pi = 2, p 2 = 3, • • •)• As the 
reader will no doubt notice, linear independence of (5.4) 
is equivalent to the unique factorization theorem. This 
simple and beautiful remark was made in 1910 by H. Bohr 
who made it a starting point of a new attack on many 
problems related to the celebrated f -function of Riemann. 

Let now Xi, X 2 , • • • be linearly independent, and con- 
sider the function 



V2 


- cos \it 4 b cos \ n t 


V- 


n 


Let A n ( a>i, C 02 ) be the set on which 


(5.6) 


wi < 


V2 


COS Xi <+•••+ COS \ n t 

■%/n 


C«>2* 


We can now prove that hr { A n (a>i, co 2 ) } is defined and more- 
over that 


(5.7) limMR{A n (w 1 , w 2 )} 

n— 



Using the notation of § 3 of this chapter, we have 


(5.8) 


sL"-*— 


t -f- • • • -|- cos X 


V 


n 



dt 




t + • • • + cos X 


V 


n 



dt 


< — f 

2T J_ 


+ 



COS \\t cos A 


V 


n 



dt 
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and 

(5.9) 




cos Xi t -\ H cos \ n t 






exp 


(itV 2 


cos Xi£ H 1- cos X 


\ 


Vn 






where both G ( + (£) and G t ~(£) are absolutely integrable 
in (— 00 , 00 ). (Thus, the interchange of order of integra- 
tion is easily justified.) 

We now prove that 


1 

(5.10) lim — 
T-xe 2T 



cos Xi£ H 1- cos X 

y/ n 



dt 




where Jo is the familiar Bessel function. 

We carry out the proof for n = 2 since the proof for 
arbitrary n is exactly the same. 

We have (setting rj = %y/2f y/n) 


(5.11) 


— f 

2 TJ-> 


^irj(coa\it+coaMt) 


k.l = 0 kill 


1 

2 T 


fj 


COS k Xi t COS 1 \2t dt, 


* Recall that we use the abbreviation 


GM)= f g t Hx)e* **dx. 
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and we must find 
1 r T 

lim — I cos* \it cos 1 \ 2 t dt = ikf{cos* Xi t cos 1 X 2 £}. 

T — >x 2T J 7 1 

Now 

11. 

cos* \ x t cos* X 2 < = -7 —j (e tXl * + e tX i<) fc ( e iX 2 < e tX 2 *)* 

2 * 2 l 


and 


-jjsiOO-*- 


k)\i + (2s — ?)\ 2 ]^ 


M{e iat ) 


1 r T . 

= lim — I e x 
T-+* 2 T j —T 


T . , fl, a - 0 , 

tat qI( = 

r 10, a j* 5 0. 


Because of linear independence, 

(2r — fc)Xi -j- (2s — Z)X 2 

can be zero only if 2 r = k and 2s = l, and thus it follows 

almost immediately that 

, , 1 /k\ 1 IV 

(5.12) M { cos* Xi< cos' X 2 <} = y ( k ] # [ j 


if both k and l are even and 0 in all other cases. We can 
write (5.12) in the form 


(5.13) ilf {cos* Xi< cos z X 2 i} = M {cos* Xi<} AT {cos* X 2 <}, 

and combining this with (5. 1 1) we obtain 

(5 14) M { ^^(cosXit +COSX 20 j = M{ e l7,oosXl< } M { e il,cosX2t } . 

It is clear that 

1 c 2r 

(5.15) M{e ir > cosXt } = — I e ivC0Be dd = J 0 (v ) 

27T Jq 
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and hence [from (5.14)] that 



irf(coa\it + cosX 2 1) 


) = JoHvl 


Thus we can consider (5.10) as having been proved. Let- 
ting T — » 00 in (5.8) and using (5.9) and (5.10) we obtain 


(5-i6) G ~ <i>j ° n iy 2 ^/i) 

1 ( /— COS Xi t -J- • • • “I - COS \nt\ 

< lim inf — I g(V 2 7= ) d* 

t-*°o 2 T J —t \ V n / 

1 C T / /-cosXiM bcosX n f\ 

5 i“ SUP 5* )J V 2 Vn ) dl 

It is well known that as rj — > ±00 


and consequently, for n > 3, 



is absolutely integrable in £. This implies that (n > 3) 
lim Lj (?,-({)/«" (yi d( 

= lim L r G,+(0J„" ( V2 -4=) d( 

e -o 2 t */_«, \ Vn/ 
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and hence that 


'rfA* 


lim 

2T u _t 


cos \\t -J- • • • + cos \ n t' 




n 


dt 


CU 2 ) } 


exists! * Now (5.16) can be written in the form 



,00 


2 x 


ori&J o 


00 


n (V 2 ~y= 

\ V n 


-) di; < ILr{A u ( 0)i, C0 2 ) } 


< 



X 


2 7T 


G+mJo" (Vi -4=) dt 


VnJ 


and one verifies easily that 


lim J 0 n ( V^2 — y=) = e ^ 1 2 
»- « \ Vn/ 


The proof of (5.7) can now be completed exactly as in § 3. 
If we look upon 


q n (t) = V2 


cos Xi£ H 1- cos \ n t 

y/ n 


as a result of superposition of vibrations with incommensu- 
rable frequencies, the theorem embodied in (5.7) gives 
precise information about the relative time q n (t) spends 
between <oi and co 2 . That we are led here to the normal law 



usually associated with random phenomena is perhaps an 
indication that the deterministic and probabilistic points 


* For n = 1 and n = 2 this is still true, but the proof has to be 
modified. 
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of view are not as irreconcilable as they may appear at 
first sight. To dwell further on this question would lead 
us too far afield, but it may be appropriate to quote a 
statement of Poincare, who said (partly in jest no doubt) 
that there must be something mysterious about the normal 
law since mathematicians think it is a law of nature 
whereas physicists are convinced that it is a mathematical 
theorem. 


PROBLEMS 


1 . Prove that if Xi, • • • , X„ are linearly independent then the func- 
tion cos Xi t, • • • , cos X n < are statistically independent, i.e., for all real 

<*i, •••,<*» 

n 

MIi{C0S Xi t < ai, • • • , COS X n < < a n ) = JI jufl {COS Xjfct < a*}. 

*=1 


[It is, of course, this property that is at the heart of the proof of 
(5.7).] 

2. Let s = a + it, a > 1, and consider the f-f unction of Riemann. 


Prove that for l 


rw-fi-n 

n»l n p 


> 0 



Af{|r(«r + *t)| 1 } = M 


\!(<r + 
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CHAPTER 4 


PRIMES PLAY A GAME OF CHANCE 


1. Number theoretic functions, density, inde- 
pendence. A number theoretic function f(ri) is a function 
defined on the positive integers 1,2,3, ••*. The mean 
M{f(n ) } of / is defined as the limit (if it exists) 

(1.1) M{f(n )} = lim i £/(n). 

JV-« N n =l 


If A is a set of positive integers, we denote by A(iV) the 
number of its elements among the first N integers. If 



A(N) 

lim — - — - = D{A} 
N-ko N 


exists, it is called the density of A. The density is analo- 
gous to the relative measure (see § 5 of Chapter 3), and 
like relative measure it is not completely additive. Con- 
sider the integers divisible by a prime p. The density of 
the set of these integers is clearly 1/p. Take now the set of 
integers divisible by both p and q (q another prime). To 
be divisible by p and q is equivalent to being divisible by 
pq, and consequently the density of the new set is 1/pq. 
Now 
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and we can interpret this by saying that the “events” of 
being divisible by p and q are independent. This holds, of 
course, for any number of primes, and we can say, using a 
picturesque but not a very precise language, that the 
primes play a game of chance ! This simple, nearly trivial, 
observation is the beginning of a new development which 
links in a significant way number theory on the one hand 
and probability theory on the other. 

We shall illustrate in detail some of the elementary as- 
pects of this development and sketch briefly the more 
advanced ones. 

2. The statistics of the Euler <j>-function. The 

number of integers not exceeding n and relatively prime 
to n is denoted by This number theoretic function, 

first introduced by Euler, has many applications and is of 
considerable interest in itself. 

One verifies at once that, if 

(m, n) = 1 

(i.e., m and n are relatively prime), then 

(2.1) <j>(mn) = <f)(m)4>(ri) 

and 

(2.2) <f>(p a ) = p a - p a ~ l . 

Thus 

(2.3) <f>(n) = XI (P“ “ 

p a [n 
p a+1 f n 

or, since 

(2.4) n = YL P a 

p a \n 

p a+1 \n 

(unique factorization !) , 
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Let us now introduce the functions p p (n ) defined as follows: 


( 2 . 6 ) 


Pp(ft-) — 


1, p\n, 
.0, p\n. 


In terms of the functions p p (n), we can write 

( 2.7) ' Y 

n p \ p / 

Observe now that, if ey is either 0 or 1, then 

( 2 . 8 ) D{p Pl (ri) p p ^{ri) € 2 , ’) Ppk(ri) } 

= D{p Pl (n) = €i}D{p Pi (ri) — €2} ' ’ • D{p Pk (ri ) = €&}. 

This is simply another way of stating that the “events” of 
being divisible by p\, p 2 , • • •, Pk are independent (or that 
the functions p p (n) are independent). 

Property (2.8) implies that 



and it suggests that 
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Unfortunately, (2.10) cannot be derived directly from 
(2.9) because the density D is not completely additive. 

On the other hand, (2.10) can be easily derived as fol- 
lows: 

From (2.5) it follows that 

n(d) 

> 

d 


( 2 . 11 ) 


4>(n) 


-E 

d\n 


where n(d) is the Mobius function defined as follows: 

1 . m(1) = 1 . 

2. n(m) = 0, if m is divisible by a square of a prime. 

3. ju (m) — (—l)", if w is a product of v distinct primes. 

It now follows that 


( 2 . 12 ) 


i * m = i * m(<o m 
iV„=i n iV d =i d LdJ 


and hence that 



1 6 


Now set 

f(2) 

7 r 2 

(2.14) 

/»(») = n ( 


and consider 

P<Pk\ 

v / 

<f>(n) 

We clearly have 

fa(n ) - 

n 

• / \ 

(2.15) 

0 < A(») - 

<t>{n) 

< 1 , 
n 
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and moreover by (2.13) and (2.9) 


(2.16) M /*(») - 


<t>(n) 


n 


Now, for l > 1, 
(2.17) 0 <h\n) 


= H (l - 4) 

p<pk\ p / 

n* ■ ( 



1 - 


V 



fk(n) 


4>{n) 


n 


and hence 


1 £, 1 " / 

- Z /*'(») > jz E ( 

iV n=l -AT n=l V 




n 


) 


1 N I N / 

> — 2 yYfa) - — £ (/*» - 

N n =l N n=l \ 


n / 


Letting JV — > » we obtain 


(2.18) M{/ fc *(n)} 


1 * /<Kn)Y . li /<i>(ri)\ l 

> lim sup — T. ( ) > lim inf — y ( - 

iV-« iV n= i \ n / JV-»=o N n=l \ 


n 


■) 


> If {/»'(»)) - IM \f k (n) - 


<t>(ri) 


But 

jf ia'w i 


71 


= M 


n(i-— )') = n^j(i-— ) 

p<pA p / J p<p* l\ p / 


p<p*L p p \ p/ J 
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and combining this with (2.16) and (2.18) we obtain, by 
letting k —*■ <x>, 


(2.19) M 


K®)' 


-n 

p 



a formula due to I. Schur. 

Formally (2.19) follows in one line: 


M 



n / 


= M 


n(i- 

V * 


Pp 


v 


■n[ 


r i 

i / 

iVi 

i -- + 

-( 

1 --) 

L p 

p\ 

p / J 


but because D is not completely additive one needs the 
justification given above. 

From (2.7) we have 


/rt i ^( n ) v- 1 A Pp( n )\ 
(2.20) log 2 log I 1 ) 

7t p \ p / 

= Y,pp( n ) log ( i - - 

P \ v 

and formally, for every real £, 


( 2 . 21 ) 
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A rigorous justification of (2.21) is almost identical with 
the one given for (2.19) and can be left to the reader. 

Let now Kn(u) be the number of integers n not exceed- 
ing N for which 

. 4>{n) 

log < CO. 

n 

Set 


( 2 . 22 ) 



K N (o>) 

N 


and note that ow(<*0 is a distribution function and that 


(2.23) 



exp 




H h exp 



N 

From (2.21) it follows that 


(2.24) 



= M 


exp 




and it is easily seen that c(£) is continuous at £ = 0. Thus 
by the theorem stated at the end of § 4, Chapter 3, there 
exists a distribution function o-(co) such that 



1 1 
— I — exp 
V V 




lim ctjv(co) = o-(co) 

JV-+0O 


and such that 
(2.26) 
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at each point of continuity of o-(co) . It is now easy to prove 
that <r(w) is continuous for every o>. To do this, we use 
the result of Problem 3 (page 45, Chapter 3) . 

We have 


(2.27) | c(£) p 


and one can show (see Problem 1 following this section) 
that the numbers 



are linearly independent. 

By considerations of § 5, Chapter 3, we have 


Um — f n IY 1 --) 2 +-(i --) 

T Jo p<vkL \ p/ p\ p/ 


X cos log ^1 - + -j d( 


-Pff/'K 1 - 

P<Pk o 1 J o L \ 




V 
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and from elementary facts about the primes we know that 



Thus, it follows that 


( 2 . 28 ) 



and consequently <r(co) is continuous for all co. 
marize : 

The density 




< CO > 



To sum- 


exists for every co, <r(co) is continuous, and 

J e 1 *" dcr(co) 

This result (first obtained by I. Schoenberg) can be derived 
in a more elementary way, and it has been vastly general- 
ized by P. Erdos.* We have chosen the more circuitous 
route to bring out the peculiarly probabilistic flavor of the 
result and to exhibit the interplay of a variety of ideas and 
techniques. 

Formula (2.21) is a clear analogue of the formula 


sin £ * £ 

= II cos — 

£ k=i 2 k 


* Erdos has also proved the remarkable theorem that our <r(«) 
is singular, i.e., cr'(co) = 0 almost everywhere. 
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with which we have started. It is, in a manner of speaking, 
a variation on the same theme, and the fact that a theme 
allows such diversified variations is a clear tribute to the 
richness of its “melodic” content. 


PROBLEMS 


1. Prove that the numbers log (1 — (1/p)) as well as log (1 + (1/p)) 
are linearly independent. 

2. Statistics of cr(ri) ( sum of divisors of n ). 

(o) Let a p (n) be defined as the power with which the prime p 
appears in the (unique) representation of n as a product of powers of 
its prime divisors, i.e., 

n = XI p«p(»). 

p 


Prove that the functions a p (n) are statistically independent. 

( b ) Show that if a (n) denotes the sum of all divisors of n then 


<r(n) 

n 



1 

^ "J" • i • 

P 



(c) Using the fact that 


prove that 


( d ) Show that 



M 


I n (i + - h — i — V^)} - n-J- 

\V<Pk \ V p“j>(») / ) P<Pk ^ 1 


(e) Set 


1 -”2 
V 


f k (n) = n (l +-+ 

V<Vk \ P 


+ — )• 
pap(n) / ’ 


fk(n) 


- n 


fW »>«! + I + ... + _i_ 

n p p<*j>(») 


note that 
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and hence derive the inequality 


n(i 

P >Pk \ 


Pp(n) \ ^ fk(n) ^ rj 1 
p / ~ <r(ri) ~p>Pk 


(/) Show that 


n 


1 + 


Pp( n ) 

V 


- n M {exp [if log (l + i +• • • + -^) ] } 


- n [> - ? + £ (? ■ - ^) ■ - [« 1 '° g (‘ ■ + ? ■ + ■ ?)] ] 


= c(0. 

(i g ) Using the fact that 


1 _ ; + £ (? - t*) exp [« i°* (i + 1 + ■ ■ ■ + £)]| 


< 1 -i + -(l — -) exp I* log (l +1)11+1 
p p \ p/ L \ p/JI jr 


-( 1 -;) V 1+ ; cos [ flog ( 1 + ?)] + ^ + ^ 


as well as the fact that the numbers log (1 + (1/p)) are linearly 
independent, prove that 



exists and is a continuous function of «. 

This result first proved by H. Davenport is included in a more 
general theorem of Erdos. 

The case « = 2 is of particular interest because it shows that 
“abundant numbers” (i.e., numbers for which <r(n) > 2 n) as well as 
“deficient numbers” (i.e., numbers for which <r(n) < 2 n) have a 
density. It also follows that “perfect numbers” (for which a(n) = 
2 n) have density 0. It is conjectured that there are only a finite 
number of “perfect numbers.” 
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3 . The inversion formula. Prove that if 


r 00 

I do-(co) = c(£), 
•'“00 


where <r(w) is a distribution function, then 


1 z* 00 — eio>i? 

S J_' — ij — c({) d( “ 


<r(wi) 


if wi and <02 are continuity points of <r. If either <01 or «2 (or both) 
are points of discontinuity, then <r(«i) or <r(o>2) (or both) should be 
replaced by 

<r(«i — 0) -j - <r{u 1 -(- 0) <r(a >2 — 0) -|- <r (&)2 -f- 0) 

or 


In particular show that 

n J ^ 1 ^ ^ l 

D I «i < log — — — < C02 1 

1 r x efatt — -py / 

= 2w J_« 7i p V 


1 - 


- + - exp I" log ( 
p V L \ 


1 


-■)]) 


dZ 


an explicit but nearly useless formula! 


3. Another application. Let o>(n) denote the number 
of prime divisors of n counting multiplicity, i.e., 

(3.1) c o(tt) — oip(?z). 

V 

where the a’s have been defined in Problem 2, § 2, of this 
chapter. 

Let v(n ) denote the number of prime divisors of n not 
counting multiplicity, i.e., 

(3.2) v(n ) = 2 Pp(n). 

v 

The difference c o(n) — v{n) will be called the excess, and we 
shall determine the density of integers for which the ex- 
cess is equal to k (k > 0, an integer), i.e., 

(3.3) dk = D{<a(ri) — v(n) = k}. 
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Needless to say, the existence of the density is not obvious 
and needs to be established. 

We start with formula (5.3) of Chapter 1 


(3.4) 



2 x 


,tmx 


2ir 


dx — 


1, 

10, 


where m is an integer, and consider 


(3.5) 


l JL l c . 
N n ?i 2t f 6 


(o(n)—v(n)—k)x 


m = 0 , 
m 7* 0, 



1 N 

Y e i(,w(n)—v{n))x fl x 

N«±i 


The left-hand side of (3.5) represents [in view of (3.4)] the 
fraction of integers n < N whose excess is exactly k. Thus 

1^1 r 2v 

(3.6) d k — lim — T) — I --<«)-*)» dx 

n->*N 2t J 0 


if the limit exists. 

Invoking once again the principle of bounded con- 
vergence, we see from (3.5) that it is enough to prove that 
for every real x the limit 


1 N 

(3.7) lim — e H<»w-v(n))x 
N-»iVn=l 


M ( n > — * , ( n )) a: 


exists. 

Now 


w(n) — v(n) = X) ( a p( n ) ~ Pp( n ))> 

p 


and the functions a p (ri) — p p (n) are easily seen to be inde- 
pendent. This suggests that not only does the limit (3.7) 
exist but also 
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i(&j(n) — v(n))x 


} 


= M {exp [ix 2 (oc p (n) - p p (n )]} 

p 


= TT M { e** (“*(») - Pp( n » } 

p 




e ix(a— 1) 



A rigorous justification is easy, and it follows along lines 
similar to those of § 2. Take first 

N 

( a p (yi) Pp(?i))i 

n=l 


and consider the integers n, 1 < n < N, for which 

a p (w) = /S. 

These are the integers divisible by pP but not divisible by 
p /3+1 , and hence their number is equal to 



It thus follows that 


(3.9) 


N 

(ofp(/i) pp(yi)) 

n= 1 


-Etf-D 

0 >2 




9k(ri) — (®p(w-) Pp (w)), 

P>Pk 


Let now 
(3.10) 
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and note that (3.9) implies that 

(3.11) = £ £ 03 - 1) (I - _L) 

p>picP >2 \p p p P ^V 


Now 


<E E03-1)4 = S , 

P>PkP> 2 p p P>Pk{p — 1) 



1 * 

(3- 12 ) = — 2 exp [ix X) (<x p (n) 

iV n =l P<Pfc 


p p (n))]e ia: «* (n) , 


and hence 


1 N i N 

— X) e ix( " (n)-, ' (n)) - — 2 exp [is 2 ( a p (n ) - p p (n))] 

-tv n =l A' n =l P<Pfc 


1 N 

— 2 ex P [** 2 («p( n ) ~ Pp(w))](e il « <:(n) - 1) 

iV n=l P<Pi 


1 ^ 

< — ^ | — 1 1 < 
iV n=l 




N 


N n= 1 


S 9k(n). 


Since 


1 N 

lim — zl ex P \. ix H ( a p( n ) ~ Pp( n ))\ 

N N n=l P<Pfc 

= M (exp [ix S («p( w ) - Pp(^))]} 

P<Pk 

= TT M Jg ix ( a p( n ) ™p?( n ) } = TT ^1 — ^ ( 1 ~\ — 

P<Pk P<Pk\ p/ \ P — 6 lX 
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we see that (3.11) implies that the distance of every limit 
point of the sequence 


from 


1 N 

gW(co(n)-Kn)) 

Nti 


n ( 1_ ")( 
p <Pk\ v/ \ 


Id — 

p/ \ p 


-U 

— e / 


is less than 


x 


E — — 

P>Ph(p — 1 ) 


Since k is arbitrary it follows at once that 

1 N 

(3.13) lim — E e ix ^ n) - v{n)) 

N-*«> N n=l 

_ i»(n)) J 




1 + 


V 



and thus (3.8) is justified. 

Going back to (3.5) and (3.6) we obtain 


(3.14) djc — D{u(n) — v(n ) = k} 


■ if*-? (■-;)( 


1 -f" 


V 


1 \ 
1 ax. 

— e tx J 


Consider now the function 


(3.15) F(z ) 




1 + 



and note that it is analytic in the whole plane, except for 
simple poles at z = 2, 3, 5, • • • . In particular, F(z) is 


» t • 
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analytic in the circle \z\ <2, and we can expand F in a 
power series 

00 

F(z) = £ a k z k , 

k=0 

whose radius of convergence is 2. 

What are the coefficients a k l If we use the familiar 
formula 

„ _ i cm. 

* - 2w J ’ 

where the integral is taken over the circle \z\ = 1, we ob- 
tain by substituting z = e tx that 

die = d k . 

In other terms, 

(3.i6) = n f 1 --K 1 +-^— y 

*=■ o p \ p/ \ p — zl 

This beautiful formula was discovered by A. R6nyi in a 
different way. 

Although it is cumbersome to get explicit formulas for 
d k , it is quite easy to determine the asymptotic behavior 
of d k for large k. 

In fact, F{z) can be written in the form 

F{z) = -A- + <?(*), 
z — 2 

where G(z ) is analytic in the circle \z\ <3 and A (the 
residue of the pole at 2) is given by the formula 

2p> 2\ pj\ p-2/ 
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f(*) = ; n (1 - ( 

4 p >2 \ p/ \ 


1 + 


V 


1 \ 00 z k 

r-J,?.? 


oo 


+ ^ bkZ k , 


k = 0 


where the radius of convergence of 26^ is 3. Since 


and 


4 P > 2 \ P/ \ 


1 + 


V 


1 1 
~2}2 k 


+ h 


limsup \ b k \ llk = J, 

A; -4o© 


we have, for k — > oo , 

(3.1 7 ) *-^(1-^(1+^ 

or 

(3.18) lim2 fc+2 d ft = XI ““Vi + “^— Y 

*->« p>2 V p/ V p - 2/ 

Two special cases of (3.16) merit attention. Setting z = 0, 
we get 

__ / 1\ 1 6 

This is a well-known result to the effect that the density of 
“square-free” numbers (i.e., not divisible by a perfect 
square) is 6/t 2 . 

Setting z = 1, we obtain 

EA = Il(i--)(i+^-r) = l- 

k= 0 p \ p/ \ P ~ 1/ 

Since the sets of integers for which co(n) — v(ri) = k are 
mutually exclusive and together exhaust the whole set of 
integers, this result would be entirely trivial, were the 
density completely additive. Since it is not, the fact that 
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we nevertheless get 

00 

X) dk = 1 

k= o 

is at least mildly amusing. 

4. Almost every integer m has approximately 
log log m prime divisors. Consider the integers m, 
1 < m < n, for which either 

(4.1) v(m) < log log n — g n ^/ log log n 
or 

v(m ) > log log n + gn'V log log n, 

where g n is a sequence approaching infinity: 

(4.2) lim g n = °o. 

Let their number be denoted by K n , and let us try to 
estimate K n . We use the Tchebysheff trick explained in 
§ 1 of Chapter 2. 

We have 

n 

(4.3) ^ (y(m) — log log n) 2 > 2 '(v(m) — log log n) 2 , 

m=l 

where the prime on the summation sign indicates that the 
summation is extended only over integers m satisfying 

(4.1). 

Clearly 

(4.4) 2'(v(m) - log log n) 2 > K n g n 2 log log n, 
and hence, by (4.3), 


(4.5) 


K n 


1 


n 


n ng n 2 log log 


- X (v(m) - log log n) 2 . 
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It remains to estimate 

n 

(4.6) 23 (K«0 — log log ri) 2 

771=1 

n n 

= 2 **0») ~ 2 lo S lo S n Z) K»») + ^(log log n) 2 . 

771=1 771=1 

Now 

v{m) = 23 Pp(w), 

and 

v 2 {m) = 2 3 Pp(w) + 2 23 Pp(w)p 9 (m) 

p J><9 

(p p 2 = pp) ; consequently 

(4.7) £ *«) = £ [-1 • 

TO=1 p LpJ 

and 

(4.8) EAm) = E[-l + 2Z[-l- 

m_l y LpJ p<fl Lpff J 

In (4.7) and (4.8) the summation is only over primes p 
and q which are less than or equal to n, and thus 

(4.9) 23 v ( m ) > » 23 *(»)» 

771=1 P<np 

where t ( n) denotes the number of primes which do not 
exceed n ; similarly 

(4.10) J3 v 2 {m) < n 2^ — h 2n 23 — 

m=l V<np P<q<npq 

<n23- + ^(z-) 2 - 

P<»p \p<»p/ 
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It is known that 


(4.11) 


^ X 

2^ - = log log n + e 

P<np 


71) 


where e n is bounded, and hence 


n 


v 2 (m) < n(log log n) 2 + 2 n log log ne n 

+ ne n 2 + n log log n + ne n 


71 =: 1 

and 


71 


yi v(m) > n log log n + ne n — ir(n). 

771=1 


Finally, (4.6) yields 


n 


23 (y( m ) ~ log log n) 2 < ne n 2 

771=1 

+ n log log n + ne n + 2 log log nr(n), 


and consequently 


K n 1 
— <-r + 


'71 


+ 


n 


U Qn 


g n log log n g n log log n 




r (n) 1 


n gn 


Since e n is bounded, ir(n) < n, and g n — > °o , it follows 
that 


(4.12) 


K n 

lim — = 0. 


n — 71 


Because of the slowness with which log log m changes, 
(4.12) implies the following: 

If l n denotes the number of integers, 1 < m < n, for 
which either 


(4.13) 


(m) < log log m — g m v log log m 


or 


v(m) > log log m + g m V log log m, 
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then 

(4.14) lim— = 0. 

n — 71 

The proof is left to the reader (see Problem 1 at the end of 
this section). The theorem embodied in (4.14) was first 
proved by Hardy and Ramanujan in 1917. It is they who 
stated it in the picturesque way that almost every integer 
m has approximately log log m prime divisors. The proof 
reproduced above is due to P. Turdn, and it is much 
simpler than the original proof of Hardy and Ramanujan. 
As the reader can see, Turin’s proof is a direct analogue of 
the proof of the weak law of large numbers, which we gave 
in § 1 of Chapter 2. Here then is another example of ideas 
borrowed from one field yielding fruitful applications in 
another. 


PROBLEMS 

1. Prove (4.14). {Hint: Let 0 < a < 1; consider only integers in 
the range n a < m < n, and show that every integer m in this range 
satisfying 

| v{m) — log log m | > g m V log log m 

satisfies also 

| v{m) — log log n | > h n V log log n, 

with an appropriately chosen h n — ► <».) 

2. Prove (4.12) for w(m). 

5. The normal law in number theory. The fact 
that v(m), the number of prime divisors of m, is the sum 

(5.1) £ Pp(m) 

V 

of independent functions suggests that, in some sense, the 
distribution of values of v(m) may be given by the normal 
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law. This is indeed the ease, and in 1939 Erdos and Kac 
proved the following theorem : 

Let K n (u) i, co 2 ) be the number of integers m, 1 < m < n, 
for which 


(5.2) 

then 

(5.3) 


log log n + a?! Vloglogn 


< v(m) < log log n + co 2 vloglogn; 


lim 

n— **> 


K n (w 1, w 2 ) 


n 



Because of the slowness with which log log n changes (see 
Problem 1 at the end of § 4) the result (5.3) is equivalent 
to the statement: 

(5.4) Z){log log n + wiV^log log n 

< v(ri) < log log n + a) 2 ^ log log n } 



There are now several different proofs of this result (the 
best being, in my opinion, a recent proof of R&iyi and 
Tur&n), none, unfortunately, sufficiently short or elemen- 
tary to be reproduced here. Consequently, we shall have 
to be content with a heuristic argument based on the fol- 
lowing classical result of Landau: 

If Tjfc(n) denotes the number of integers not exceeding n 
having exactly k prime divisors, then 


(5.5) 7 r fc (n) 


1 


n 




(k — 1) ! log n 


(log log n) 


k — 1 


For k = 1, this is the familiar prime number theorem; 
for k > 1, (5.5) can be derived from the prime number 
theorem by entirely elementary considerations. 
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Now 


(5.6) K n (w 1, C0 2 ) 


loglogn +«i‘ 


, X , ?rfc(n), 

loglogn <k <loglogn V loglogn 


and hence one might expect that 


(5.7) 


K n (w 1 , C0 2 ) 


— — 2 

log W loglogn+&»i\/loglogn <A <loglogn+« 2 's/loglogn 

(log log n) k 

(fc — 1)! 

If one recalls Problem 2, § 3, Chapter 3 and sets 


(5.8) 


x = loglogn 


one will obtain 


(e-* - — ) 
\ log n/ 


« 2 

e 1/2/2 dy 

or (5.3). 

Unfortunately, it is not easy to rigorize this highly ap- 
pealing argument because one needs uniform error esti- 
mates in Landau’s theorem (5.5), and they are not easily 
obtainable. It might be of interest to mention that the 
original proof of Hardy and Ramanujan of the theorem 
of § 4 was based essentially on (5.5), although they needed 
only certain estimates rather than the precise asymptotic 
result. The theory which we have developed in Chapter 3 
suggests a method of proving (5.3). Let K n (o>) denote the 
number of integers m, 1 < m < n, for which 

v{m) < log log n + coV log log n, 
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and set 
(5.9) 


o»(«) = 


*«(«) 


n 


It is clear that <r n (w) is a distribution function, and 


(5.10) 


1 


2 ( v ( m ) ” log log n) 2 = f co 2 dcrn(w) 

n —1 ** — 00 


n log log rim^i 
If we use the precise estimate 


_ 1 

(5.11) 2 ” = tog !°g n + C + i n , in 

p<np 

then the argument of § 4 gives 


0, 


r x l 

(5.12) lim I co 2 d<r n ( «) = 1 = 7 — I ?/ 
n— >» «/ — oo V2t J_oo 

We also have (almost trivially!) 


2 e -V 2 l2 


dy. 


1 


n 


lim — , , 

n-»nv' log log n TO= i 


2 (K™) — log log n) = 0, 


and hence 


1 r 

(5.13) lim I w dcrnW) = 0 = — 7 = I ye"^ 12 dy, 

ft ^ ® If V r — QQ 

If we could prove that for every integer k > 2 


r* 1 r™ 

(5.14) lim I w fc d(7 n (co) = . — I 

n-^00%7 qq *v 27T */ go 


y k e —V*l 2 fly* 


it would follow that 


00 

lim I e^ w d<7 n (o}) = 
_ 00 


,-f 2 / 2 


77 
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for every real £ and hence that 


(5.15) 


lim o- n (w) = — 7 = J* e"** 12 dy. 


n— 


— 00 


This, in view of (5.9), is nothing but our theorem (5.3). 
Proving (5.14) is, of course, equivalent to proving that 


n 


(5.16) lim 


71 "■> 00 


ft (log log n) kl2 m= i 


X) (v(m) ~ log log n) k 



yk e -V 2 l2 


dy, 


—00 


and this in turn depends on asymptotic evaluations of sums 

E — - — 

Vl v ..Vl k < n Ph • • • Pl k 

(Recall that in § 4 Turin’s proof depended on an estimate 
of 

E -•) 

pg<n PQ 


This, remarkably enough, is not at all easy, but recently 
Halberstam succeeded in carrying out the proof along 
these lines. This approach, without doubt, is the most 
straightforward and closest in spirit to the traditional lines 
of probability theory. The ultimate triumph of the proba- 
bilistic method in number theory came with the proof by 
Renyi and Turan that the error term 


KrM 

n 



is of the order of 


1 

»■ • 

\/log log ft 
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That the error is of order (log log n )~ Y2 was conjectured by 
Le Veque by analogy with similar estimates in probability 
theory — the primes, indeed, play a game of chance! 


PROBLEMS 


1. Show that (5.4) holds if v(n) is replaced by w(n) (i.e., number of 
prime divisors counting multiplicity). (Hint: From the fact that 
Af{w(n) — v(n) } < oo, deduce first that the density of the set of 
integers for which w(n) — v(n) > g n , g n —> w, is 0.) 

2. Let d(n) denote the number of divisors of n. 

(a) Show that _ 

d(n ) = II (« P (n) + 1). 

V 

(For definition of a p (n) see Problem 2, § 3 of this chapter.) 

( b ) Show that 

m (^1 -n(i+— ^ — r)<«. 

| 2 Kn)j Y V 2p(p - 1)/ 

(c) Using (5.4) and the hint to Problem 1 above, prove that 


jy j2loglogn+wiVloglog» < d{n) < 2 l °slogn + o> 2 Vloglogn J 
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CHAPTER 5 


FROM KINETIC THEORY TO 
CONTINUED FRACTIONS 


1. Paradoxes of kinetic theory. About the middle of 
the nineteenth century attempts were begun to unite the 
disciplines of mechanics and thermodynamics. 

The main problem was to derive the Second Law of 
Thermodynamics from the picture of matter as consisting 
of particles (atoms or molecules) subject to forces and 
obeying the laws of mechanics. 

In the hands of Maxwell and Boltzmann (and later 
J. W. Gibbs) this kinetic approach flowered into one of the 
most beautiful and far-reaching achievements of science. 

But the approach was marred, at the outset, by two 
paradoxes. The first, voiced in 1876 by Loschmidt, con- 
sisted in observing that the laws of mechanics are time re- 
versible (i.e., invariant under the change of t into —t). 

On the other hand the Second Law of Thermodynamics 
postulates a typically irreversible behavior. 

It thus seems impossible to ever derive the Second Law 
from purely mechanistic considerations. 

The second paradox, associated with the name of 
Zermelo, is even more decisive. 

Zermelo invoked a simple but fundamental theorem of 
Poincar6 to the effect that a conservative dynamical 
system, satisfying certain mild conditions, has the property 
that “almost every” (in a certain technical sense to be ex- 
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plained below) initial state of the system is bound to recur, 
to any degree of accuracy. 

This too is in contradiction with irreversible behavior. 

To appreciate these paradoxes consider two containers, 
one containing a gas and the other completely evacuated. 

At some time we connect the containers. The Second 
Law predicts then that the gas will flow from the first con- 
tainer into the second and that the amount of gas in the 
first container will decrease monotonically in time. Such 
behavior of the gas shows a definite arrow of time. 

From the kinetic (mechanistic) point of view we are 
dealing with a dynamical system which cannot possibly 
show the time arrow and which moreover will behave in a 
quasi-periodic way as implied by Poincare’s theorem. 
That we have here a paradoxical situation is clear. 


2. Preliminaries. To understand Boltzmann’s reply 
we need a little review of classical dynamics. 

A system of n degrees of freedom is described in terms 
of n generalized coordinates qi, q 2 , • • •, q n and conjugate 
momenta Pi,P2, •••,?>«. For a conservative dynamical 
system there is a function H {q ± , • • • , q n ; Pi , • • • , p n ) , 
known as the Hamiltonian function, of the system which 
represents its total energy. 

The equations of motion are of the form 


( 2 . 1 ) 

( 2 . 2 ) 


dqi 

dt 

dpi 

dt 


dH 

— — • i = L 2 > ' * •> n > 

dpi 


dH 

» i — 1, 2, • • *, n, 

ri ' 7 ' 7 

dqi 


and, if we know the initial positions qi(0) and initial mo- 
menta Pi( 0), the motion [i.e., the functions qft) and Pi{t)] 
is uniquely determined. 

It is customary to represent the system as a point in the 
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2n-dimensional Euclidean space (phase space or T-space) 
with coordinates q v • • •, q n , Pi, • • *, Pn- 

Thus at time t the dynamical system is represented by 
the point 

Pt = (?i (f), • • • ,Qn(t ), Pi(t), • • •, ?»(<))• 

Now, the motion of our system defines a one-parameter 
family of transformation T t by the relation 

(2.3) T t (P 0 ) = P t . 

Suppose now that we have a set A of points Po, and 
denote by Tt(A) the set of corresponding points Pt- 

It was noticed by Liouville (the proof is quite simple 
and can be based on the generalization of the familiar 
divergence theorem to 2n-dimensional space) that the 
Hamiltonian equations of motion (2.1) and (2.2) imply 
the remarkable fact that the 2n-dimensional Lebesgue 
measures of A and T t (A) are equal! 

In other words the transformations Tt are measure pre- 
serving, the measure being the ordinary Lebesgue measure 
in T-space. 

Equations (2.1) and (2.2) have another important con- 
sequence, namely, that 

• • •, q n (t), pi(t), • • *, p n (t)) 

— H(qi(0), • • •, q n ( 0), pi(0), • • •, p n (0)) 

(conservation of energy), and consequently the point 
representing our dynamical system is constrained to lie 
on an “energy surface” 

(2.4) H(qi, • • • , q n , Pi, • • • , Pn ) = const. 

Let us assume that the energy surface & is compact and 
sufficiently “regular” so that the elementary theory of sur- 
face integration is applicable and assume also that on Q 
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119 “ /dH \ 2 ( dH \ 2 

(2.5) II VHf = E( — )+( — ) >00. 

i= 1 \dp;/ \dqi / 

Let B c fi be a set on the energy surface such that 

p da 

Jb || VH II ' 

where da is the surface element, is defined. We define the 
measure n{B] of B by the formula 


( 2 . 6 ) 

so that 



(2.7) / = 1. 

It now follows from Liouville’s theorem above, by simple 
geometric considerations, that 

( 2 . 8 ) = »{B}. 

In other words Tt preserves the measure n on ft. 

Formula (2.6) assigns measures only to certain elemen- 
tary sets (to which the elementary theory of surface inte- 
gration is applicable). However, the measure can be ex- 
tended to a much wider collection of sets in the same way 
as, starting from intervals on the real line and defining the 
measure of an interval to be its length, one builds up the 
completely additive measure of Lebesgue. 

In particular, a set C is of ^-measure 0 if for every 
e > 0 there is a finite or denumerable collection B{ of ele- 
mentary sets such that 

cell Bi and EmIB.I <«. 
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We can now state in precise terms Poincare’s theorem 
invoked by Zermelo. 

If B is ^-measurable then almost all Pq tB (i.e., except 
for a set of ^-measure 0) have the property that for some t 
(depending possibly on P 0 ) Tt(Po) e B. 

3. Boltzmann’s reply. To understand Boltzmann’s 
reply let us go back to our example of the two containers. 
Suppose that we know the precise functional form of the 
Hamiltonian 

(3.1) H(qi, • • •, q n , p\, • • •, p n ) 

and its value C at t = 0. Thus we know the energy surface 
(its equation is H — C). 

There is clearly a set B of points of 0, corresponding to the 
condition that at t — 0 all the particles are in one of the 
two containers, and we know that our system starts from 
the set B . 

The first assertion of Boltzmann was that the ^-measure 
n{B\ of B is “extremely” small, corresponding to our in- 
tuition that we are starting from a highly unusual or rare 
state. On the other hand the set R of points of cor- 
responding to states in which the number of particles in 
the two containers are “very nearly” proportional to the 
volumes of the two containers, is such that is “ex- 
tremely” close to 1. 

Of course, these statements depend to a large extent on 
the meanings of “extremely” and “very nearly,” but suffice 
it to say that because of the enormity of the number of 
atoms per cubic centimeter (of the order of 10 20 ) it is quite 
safe to interpret “extremely” as being less than 10 -10 and 
“very nearly” as being within 10 — 10 of the proper ratio. 

The second assertion was much more daring. Boltz- 
mann argued that the first assertion implies that the rela- 
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tive times which the actual curve describing the motion of 
the system spends in B and R are respectively “extremely” 
small and “extremely” large. 

In other words, the system in an unusual state will al- 
most immediately leave it (though by Poincare’s theorem 
it will almost surely return to it eventually) , and once in a 
set corresponding to “nearly normal” states it will stay 
there “essentially” forever. 

Boltzmann dealt with the first assertion by plausible 
but not very rigorous estimates. To justify the second as- 
sertion he introduced a hypothesis that the curve represent- 
ing the motion of the system passes through every point of 
the energy surface. 

This hypothesis which Boltzmann called the ergodic hy- 
pothesis (Ergodenhypothese) is false (except for n = 1 
when it is trivial). 

Boltzmann tried to salvage his explanation by replacing 
the wrong ergodic hypothesis by what he called the “quasi- 
ergodic hypothesis.” This new hypothesis postulated that 
the curve of motion passes arbitrarily close to every point 
on the energy surface. This, though highly plausible, is 
not sufficient to establish a connection between the relative 
time spent in a set A C S2 and its ^-measure, n { A } . 

Clearly it is the connection between the relative time 
spent in A and p{ A} that is the crux of the matter. 

But what do we mean by the relative time spent in A? 
The definition suggests itself almost immediately. Let 
t{r, P o, A) denote the time the curve of motion starting 
from Po spends in A up to time r. The relative time is then 
the limit 



.. t(r, Po, A) 

lim 


if, of course, it exists. 
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It turns out that the proof of existence of this limit con- 
stitutes the real difficulty. Once this is done one needs 
only an additional assumption of T t to conclude that the 
limit is equal to n { A } . 

4. The abstract formulation. Now that I have 
dwelt at such length on the background of statistical me- 
chanics I shall proceed to disregard most of it and abstract 
from it its purely mathematical content. 

Instead of the energy surface I take a set ft (of total 
measure 1) on which a completely additive measure n is 
given. 

I now assume that there is given a one-parameter family 
of transformations T t of ft onto itself which preserve the 
ju-measure. This statement requires a word of comment. 
In dynamics the transformations Tt are one-to-one (this 
is an immediate consequence of uniqueness of solutions of 
Hamiltonian equations of motion). It is, however, not 
necessary to assume that T t are one-to-one if one properly 
defines what is meant by measure preserving. 

The proper definition is as follows: Let T t ~ l (A) be the 
inverse image of the set A ; i.e., 

(4.1) T t (Tr l (A)) = A. 

The transformation Tt is said to be measure preserving if 

(4.2) ATrM)) -p(A). 

For one-to-one transformations (4.2) is clearly equivalent 
to the usual definition of preservation of measure; i.e., 

(4.3) n[T t {A)} =m{A}. 

Let now P 0 e ft and g(P) the characteristic function of 
the measurable set A ; i.e., 
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(4.4) 


9(P) = 


1 , P*A, 
0, P e A. 


It is now clear that t(r , Po, A) is given by the formula 


(4.5) 


t(r, Po, A) — f g(T t (P 0 )) dt, 

•^o 


and the problem is the existence of the limit 


(4.6) 


i r T 

lim - I g(T, 

r — ► « T i/ 0 


(Po)) dt. 


Together with this version, in which the time varies 
continuously, it is convenient to consider a discrete version. 
Let T be a measure-preserving transformation; i.e., 

(4.7) it{T~'(A)) = n{A], 

and consider its powers (iterations) T 2 , T 3 , • • • . 

The analogue of the limit (4.6) is now 

(4.8) lim - Z 9(T h (P 0 )). 

n— ►<» U 

In 1931 G. D. Birkhoff succeeded in proving that the 
limits (4.6) and (4.8) exist for almost every Po (in the 
sense of ^-measure) . A little earlier John von Neumann 
proved that the limits (4.6) and (4.8) exist in the sense of 
mean square. 

There are now various proofs of these theorems, the 
shortest being one given by F. Riesz. We shall omit the 
proof, referring the reader to an excellent booklet of P. R. 
Halmos, Lectures on Ergodic theory, published by the 
Mathematical Society of Japan. 

What can one say about the limit (4.8) [or (4.6)]? 
Denoting this limit by h(Po) we see immediately that it 
is M-measurable, bounded (in fact, 0 < h(P 0 ) < 1), and 



88 


STATISTICAL INDEPENDENCE 


such that for almost every P 0 

(4.9) h(T(P 0 )) = h(P 0 ). 

Let now H a be the set of P 0 ’s for which 

h(P 0 ) < ct, 

and let Q € T^iHa). Thus T(Q ) e H a , and hence 

h(T(Q)) < a . 

Since, for almost every Q, h(T(Q )) = h(Q) we see that 
h(Q ) < a except for a set of Q’s of /z-measure zero. Con- 
sequently, except for a set of ^-measure zero, 

T~ x (H a ) = H a 

for every a (the exceptional set may, of course, depend on 
a). 

In other words the sets H a are invariant (up to sets of 
measure zero) sets. 

A transformation is called ( ‘metrically transitive” if the 
only invariant sets are either of measure zero or one. 

If we assume that our transformation T is metrically 
transitive we see that all sets H a are either of measure zero 
or one, and hence h(P 0 ) is constant almost everywhere. 

The value of this constant is readily determined by 
noting that 

lim - Z niT^Po)) = h(P 0 ) (a.e.) 

n—K© 71 

implies (by the theorem on bounded convergence) that 


(4.10) 
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In fact, 

f ff(I*(P 0 )) dn-f g(P 0 ) dp = ji{A) 

1 / 12 v 12 

(this is an immediate consequence of the fact that T is 
measure preserving), and hence 

I H(Pq) dn = n{A}. 

Ja 

Thus the constant is equal to n { A } . 

Combining all this we can say that, if T is metrically 
transitive, then for almost all Pq 

(4.11) lim-E»(r*(Po))-»>M). 

This can be easily generalized as follows : 

If /(P 0 ) is /x-integrable, i.e., 

f |/(Po)l<fo < », 

Jn 

and if T is metrically transitive, then for almost all Po’s 

(4.12) lim - 2 /(r*(P 0 )) = f/(?o) dp. 

n-+<*7l] Css .i J 12 

One might think that the proof of (4.12) vindicates com- 
pletely Boltzmann’s views. Unfortunately the transforma- 
tions T t to which we are led in dynamics are so complex 
that, except for some very simple cases, it is not known 
whether they are metrically transitive or not. This, how- 
ever, in no way detracts from the beauty and importance 
of the ergodic theorem (4.12). 

5. The ergodic theorem and continued fractions. 

Let x, 0 < x < 1, be a real number and let us expand it 
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in a simple continued fraction 




1 







where a\, a 2 , • • • are positive integers. It is easy to derive 
formulas for the a’s. 

We have 



where as usual [y] denotes the greatest integer less than or 
equal to y. 

The formulas for the a’s become progressively more and 
more complicated but a little thought will show that they 
can be fitted into the following pattern. 

Let 

(5.2) T(x) = - - T-l ; 

X LXJ 

then 

(5.3) a 2 (x ) = ai{T(x)), 

(5.4) a 3 (x) = a 2 (T(x)) = ai(T 2 (x)), 
etc. 

The possibility of applying the ergodic theorem becomes 
evident now since we are dealing here with iterations of 
the transformation T(x) given by (5.2). 

What is the space 12? Simply the interval (0, 1) with 0 
excluded. 
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What is the invariant measure? This is more difficult 
to answer but was, in essence, done already by Gauss. 

One can proceed as follows: Let p(x), 0 < x < 1, be 
such that 

(5.5) (a) p{x) > 0, (6) I p(x) dx = 1, 

Jo 

and let us define p { A } by the formula 

(5.6) ju{A}= | p(x) dx. 


Take now an interval (a, /3), 0 < a < /3 < 1, and con- 
sider its inverse image under transformation T(x). 

We have 


00 


(5.7) 

and hence 

(5.8) 


T~\a , « = U 


1 


&=! \k /3 k -f- a, 


AT- 1 (,a, /3) } 



l/(&+2) 


p{x) dx. 


=1 ''lKk+0) 


If ju is to be preserved we must have 


(5.9) 


/» P 00 /* 

I p(x) dx = 2 I 

•'a &=1 


ll(k+2) 


p(x) dx 


i /(&+£) 


for all a and /3. 

We do not know how to go systematically about solving 
(5.9). But it is easy to verify that 


(5.10) 



1 1 
log 2 1 + x 


is a solution and satisfies conditions (5.5). 

This is all one needs except to check that T(x) is metri- 
cally transitive, and this is entirely trivial. 1 
If fix) is ju-integrable, i.e., 

1 See footnote on page 94. 
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(5.11) 

then by (4.12) 


1 r 1 

log 2 J 0 1 


dx 


+ x 


< °°> 


(5.12) lim 1 E/(T‘W) = -L- f /(*) 

«-*» n fc =0 log 2 Jq 1 + x 


for almost every x (note that sets of ju-measure 0 are 
identical with sets of ordinary Lebesgue measure 0). 

Let now 


(5.13) /(^) = log ai(^). 


We obtain now from (5.12) that for almost all x 


(5.14) 

lim (ai<i 2 

n — 

• 

• 

« 

£ 

S' 

II 

c, 

where 





/ 1 

c 

dx \ 

(5.15) 

C = exp ( 

1 log ai (x) 



Vlog 2 

Jo 

1 + x) 


/ 1 

00 

(k + l) 2 


= exp l 

z, log « log 



Vlog 2 

fc=i 

k(k -f" 2) 



This remarkable theorem was first proved (by a different 
method) by Khintchine in 1935. The presented proof is 
due to C. Ry 11-N ardzewski . 

I could have easily spared the reader the first three sec- 
tions of this chapter. I could have started with the ab- 
stract formulation of § 4 and have avoided any mention 
of dynamics and, kinetic theory. 

But had I done this I would have suppressed the most 
exciting and, to my mind, most instructive part of the 
story, for the road from kinetic theory as conceived by 
Boltzmann and others to continued fractions is a superb 
example of the often forgotten fact that mathematics is 
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not a separate entity but that it owes a great deal of its 
power and its beauty to other disciplines. 


PROBLEMS 


1. Let B C fi be /^-measurable and m { B } 5^ 0. If T is measure 
preserving (but not necessarily metrically transitive), prove that for 
almost every Po t B there is an integer n > 1 such that T n (Po ) e B. 
(This is the discrete version of Poincare’s theorem; to prove it con- 
sider the set C C B such that if Po eC then T n (Po) IB for n = 
1 ,2, • • • . Show then that C is /u-measurable and that C, T~\C), 
T~\C), • • • are all disjoint). 

2. Let n(Po), Po « B, be the first positive integer such that 
T n( - P °\p 0 ) e B. If T (in addition to being measure preserving) 
is metrically transitive, prove that 



3. Let x, 0 < x < 1, be expanded in a continued fraction 


x 



J 


Mid let B be the set on which ai(x ) = k (i.e., 1 /(k + 1) < x < 1/k). 
Let n(x, k ) denote the least integer greater than 1 such that a n ( x ,k ) = k. 
Show that 


1 

log 2 



dx 

1 + x 



4. Let 0 < x < 1 and T(x) = 2x — [2x]. Derive Borel’s theorem 
of Chapter 2 by an application of the ergodic theorem. 
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1 Here I was carried away by a burst of inexplicable optimism. 
While the fact is intuitively obvious the proof is, alas, .not In the 
paper by Ryll-Nardzewki cited at the end of this chapter the proof 
is given in § 2. It is due to K. Knopp. 



